Uwe L. Korn created DRILL-4977:
----------------------------------
Summary: Reading parquet metadata cache from S3 with
fadvise=random and Hadoop 3 generates a large number of requests
Key: DRILL-4977
URL: https://issues.apache.org/jira/browse/DRILL-4977
Project: Apache Drill
Issue Type: Improvement
Components: Storage - Parquet
Affects Versions: 1.8.0
Environment: Hadoop 3.0
Reporter: Uwe L. Korn
When using the new {{fs.s3a.experimental.input.fadvise=random}} mode for
accessing Parquet files stored in S3, we see a significant improvement for the
query performance but a slowdown on query planning. This is due to the way the
metadata file is read (each chunk of 8000 bytes generates a new GET request to
S3). Indicating with {{FSDataInputStream.setReadahead(metadata-filesize)}} that
we will read the whole file, this behaviour is circumvented.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)