I'm having trouble with refreshTable, I suspect because I'm using it
incorrectly.

I am doing the following:

1. Create DF from parquet path with wildcards, e.g. /foo/bar/*.parquet
2. use registerTempTable to register my dataframe
3. A new file is dropped under  /foo/bar/
4. Call hiveContext.refreshTable in the hope that the paths for the
Dataframe are re-evaluated

Step 4 does not work as I imagine -- if I have 1 file in step 1, and 2
files in step 3, I still get the same count when I query the table

So I have 2 questions

1). Is there a way to see the files that a Dataframe/RDD is underpinned by
2). What is a reasonable way to refresh the table with "newcomer" data --
I'm suspecting I have to start over from step 1 to force the Dataframe to
re-see new files, but am hoping there is a simpler way (I know frames are
immutable but they are also lazy so I'm thinking paths with wildcards
evaluated per call might be possible?)

Thanks for any insights.

Reply via email to