I'm having trouble with refreshTable, I suspect because I'm using it incorrectly.
I am doing the following: 1. Create DF from parquet path with wildcards, e.g. /foo/bar/*.parquet 2. use registerTempTable to register my dataframe 3. A new file is dropped under /foo/bar/ 4. Call hiveContext.refreshTable in the hope that the paths for the Dataframe are re-evaluated Step 4 does not work as I imagine -- if I have 1 file in step 1, and 2 files in step 3, I still get the same count when I query the table So I have 2 questions 1). Is there a way to see the files that a Dataframe/RDD is underpinned by 2). What is a reasonable way to refresh the table with "newcomer" data -- I'm suspecting I have to start over from step 1 to force the Dataframe to re-see new files, but am hoping there is a simpler way (I know frames are immutable but they are also lazy so I'm thinking paths with wildcards evaluated per call might be possible?) Thanks for any insights.