1. For my understanding, what's the use case for turning this feature on and off? Why not have it on all the time?
2. A query/session option seems awkward because Impala loads the block metadata in the catalogd and caches it. How would an impalad know if there is already sufficient metadata in the cache? Should we reload the table metadata whenever such a SET option is used? I'm thinking of a table that does not have data in subdirectories. You could add an additional "loading state" to a table to indicate whether it was loaded with/without subdirectories. Overall this solution does not seem to fit very well into the existing architecture, and sounds overly complicated. 3. A table property is more consistent with the existing architecture. On Thu, May 4, 2017 at 11:03 PM, Shant Hovsepian <[email protected]> wrote: > Hi All, what are people's thoughts on IMPALA-4726 > <https://issues.apache.org/jira/browse/IMPALA-4726> and IMPALA-4596 > <https://issues.apache.org/jira/browse/IMPALA-4596>? These are concerning > support for recursing through subdirectories in a table location to search > for all data files. > > Restricting the behavior to external tables only seems like a good idea, > but as for turning on the behavior what are thoughts around making it a > runtime session setting with "SET" like hive does, or potentially making it > something permanent like a table property. > > Thanks! > > -Shant >
