Uwe L. Korn created DRILL-4978:
----------------------------------

             Summary: Parquet metadata cache on S3 is always renewed
                 Key: DRILL-4978
                 URL: https://issues.apache.org/jira/browse/DRILL-4978
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.8.0
         Environment: Hadoop s3a storage
            Reporter: Uwe L. Korn


As dictionary modification times are not tracked by S3 (see 
https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories
 ) the Parquet metadata is always renewed on query planning.

This could either be tuned by:
 * for the case of s3a, check the modification times of all Parquet files in 
this directory
 * deactivate the metadata cache for s3a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to