Uwe L. Korn created DRILL-4978:
----------------------------------
Summary: Parquet metadata cache on S3 is always renewed
Key: DRILL-4978
URL: https://issues.apache.org/jira/browse/DRILL-4978
Project: Apache Drill
Issue Type: Bug
Components: Storage - Parquet
Affects Versions: 1.8.0
Environment: Hadoop s3a storage
Reporter: Uwe L. Korn
As dictionary modification times are not tracked by S3 (see
https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories
) the Parquet metadata is always renewed on query planning.
This could either be tuned by:
* for the case of s3a, check the modification times of all Parquet files in
this directory
* deactivate the metadata cache for s3a
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)