Re: [PR] DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns (drill)

via GitHub Tue, 03 Sep 2024 02:45:26 -0700


rymarm commented on PR #2937:
URL: https://github.com/apache/drill/pull/2937#issuecomment-2326078335


   @paul-rogers, there is no new feature. This behavior of reading all parquet 
files metadata during the planning phase has been present for a long time. 
Moreover, we even have a feature called "parquet metadata cache" aimed to 
resolve the con of this logic 
   when the planning phase takes significant time due to the reading of 
metadata of many distinct parquet files
   
   > Parquet metadata caching is a feature that enables Drill to read a single 
metadata cache file instead of retrieving metadata from multiple Parquet files 
during the query-planning phase
   > ...
   > Metadata caching is useful when planning time is a significant percentage 
of the total elapsed time of the query 
   
   
   https://drill.apache.org/docs/optimizing-parquet-metadata-reading/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns (drill)

Reply via email to