Hi Spark Users,

I'm testing 1.3 new feature of parquet partition discovery.
I have 2 sub folders, each has 800 rows.
/data/table1/key=1
/data/table1/key=2

In spark-shell, run this command:

val t = sqlContext.createExternalTable("table1", "hdfs://xxxx/data/table1",
"parquet")

t.count


It shows 1600 successfully.

But after that, I add a new folder /data/table1/key=3, then run t.count
again, it still gives me 1600, not 2400.


I try to restart spark-shell, then run

val t = sqlContext.table("table1")

t.count


It's 2400 now.


I'm wondering there should be a partition cache in driver, I try to
set spark.sql.parquet.cacheMetadata
to false and test it again, unfortunately it doesn't help.


How can I disable this partition cache or force refresh the cache?


Thanks

Reply via email to