[jira] [Commented] (HIVE-26699) Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX

Steve Loughran (Jira) Thu, 15 Dec 2022 09:57:05 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648183#comment-17648183
 ]


Steve Loughran commented on HIVE-26699:
---------------------------------------

in the builder pattern we use in hadoop. .opt() options are ignored by 
filesystems which don't recognise them. it's only the .must() ones which MUST 
be understood. so its safe to use

passing in FileStatus to the openFile() calls saves on a HEAD on s3a and abfs 
but has been a bit brittle in the past until it stabilised. You can just pass 
in the file length with the option fs.option.openfile.length and have it picked 
up where it is understood 

> Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX
> --------------------------------------------------------------
>
>                 Key: HIVE-26699
>                 URL: https://issues.apache.org/jira/browse/HIVE-26699
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive reads JSON metadata information (TableMetadataParser::read()) multiple 
> times; E.g during query compilation, AM split computation, stats computation, 
> during commits  etc.
>  
> With large JSON files (due to multiple inserts), it takes a lot longer time 
> with S3 FS with "fs.s3a.experimental.input.fadvise" set to "random". (e.g in 
> the order of 10x).To be on safer side, it will be good to set this to 
> "normal" mode in configs, when reading iceberg tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26699) Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX

Reply via email to