[
https://issues.apache.org/jira/browse/HIVE-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648183#comment-17648183
]
Steve Loughran commented on HIVE-26699:
---------------------------------------
in the builder pattern we use in hadoop. .opt() options are ignored by
filesystems which don't recognise them. it's only the .must() ones which MUST
be understood. so its safe to use
passing in FileStatus to the openFile() calls saves on a HEAD on s3a and abfs
but has been a bit brittle in the past until it stabilised. You can just pass
in the file length with the option fs.option.openfile.length and have it picked
up where it is understood
> Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX
> --------------------------------------------------------------
>
> Key: HIVE-26699
> URL: https://issues.apache.org/jira/browse/HIVE-26699
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Hive reads JSON metadata information (TableMetadataParser::read()) multiple
> times; E.g during query compilation, AM split computation, stats computation,
> during commits etc.
>
> With large JSON files (due to multiple inserts), it takes a lot longer time
> with S3 FS with "fs.s3a.experimental.input.fadvise" set to "random". (e.g in
> the order of 10x).To be on safer side, it will be good to set this to
> "normal" mode in configs, when reading iceberg tables.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)