Uwe L. Korn created DRILL-4978: ---------------------------------- Summary: Parquet metadata cache on S3 is always renewed Key: DRILL-4978 URL: https://issues.apache.org/jira/browse/DRILL-4978 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.8.0 Environment: Hadoop s3a storage Reporter: Uwe L. Korn
As dictionary modification times are not tracked by S3 (see https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories ) the Parquet metadata is always renewed on query planning. This could either be tuned by: * for the case of s3a, check the modification times of all Parquet files in this directory * deactivate the metadata cache for s3a -- This message was sent by Atlassian JIRA (v6.3.4#6332)