Hi Spark Users, I'm testing 1.3 new feature of parquet partition discovery. I have 2 sub folders, each has 800 rows. /data/table1/key=1 /data/table1/key=2
In spark-shell, run this command: val t = sqlContext.createExternalTable("table1", "hdfs://xxxx/data/table1", "parquet") t.count It shows 1600 successfully. But after that, I add a new folder /data/table1/key=3, then run t.count again, it still gives me 1600, not 2400. I try to restart spark-shell, then run val t = sqlContext.table("table1") t.count It's 2400 now. I'm wondering there should be a partition cache in driver, I try to set spark.sql.parquet.cacheMetadata to false and test it again, unfortunately it doesn't help. How can I disable this partition cache or force refresh the cache? Thanks