[
https://issues.apache.org/jira/browse/HIVE-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647708#comment-17647708
]
Ayush Saxena commented on HIVE-26699:
-------------------------------------
{quote}is hive hadoop 3.3.x + only yet?
{quote}
yes, we are at 3.3.1.
[~rajesh.balamohan] I couldn't find a standard way where I can set in the conf
this value for Iceberg metadata read only, And I had doubts setting it in
couple of places, like if the conf is shared or something like that, if it will
get used at some other place as well, where we don't intend to do so.
But I tried a draft approach, using the openFile API, it is a bit a hacky for
hive, but thats what I could think as of now.
[https://github.com/apache/hive/pull/3862/files#diff-661ab0f0af817370c70a7320b3cf51d3b0ff690f6a74aa97765bb0c819a550bbR181-R184]
I ditched the instanceof check, thinking what this config can harm even in
other filesystems and checking FS vs setting this should have same cost and
instance of might not be very correct in case of ViewFs or so.
just fyi. the core iceberg seems to be on hadoop-2 line only if I got it right:
[https://github.com/apache/iceberg/blob/master/versions.props#L4]
Let me know if this approach can work for now or I will try to discuss with
folks and see if we can find some other route.
> Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX
> --------------------------------------------------------------
>
> Key: HIVE-26699
> URL: https://issues.apache.org/jira/browse/HIVE-26699
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Hive reads JSON metadata information (TableMetadataParser::read()) multiple
> times; E.g during query compilation, AM split computation, stats computation,
> during commits etc.
>
> With large JSON files (due to multiple inserts), it takes a lot longer time
> with S3 FS with "fs.s3a.experimental.input.fadvise" set to "random". (e.g in
> the order of 10x).To be on safer side, it will be good to set this to
> "normal" mode in configs, when reading iceberg tables.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)