[
https://issues.apache.org/jira/browse/HIVE-26307?focusedWorklogId=780857&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780857
]
ASF GitHub Bot logged work on HIVE-26307:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jun/22 15:07
Start Date: 13/Jun/22 15:07
Worklog Time Spent: 10m
Work Description: pvary commented on code in PR #3354:
URL: https://github.com/apache/hive/pull/3354#discussion_r895836024
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -381,100 +400,67 @@ private CloseableIterable<T> newAvroIterable(
Avro.ReadBuilder avroReadBuilder = Avro.read(inputFile)
.project(readSchema)
.split(task.start(), task.length());
+
Review Comment:
The first check in `openVectorized` is this:
```
Preconditions.checkArgument(!task.file().format().equals(FileFormat.AVRO),
"Vectorized execution is not yet supported for Iceberg avro
tables. " +
"Please turn off vectorization and retry the query.");
```
Issue Time Tracking
-------------------
Worklog Id: (was: 780857)
Time Spent: 0.5h (was: 20m)
> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -----------------------------------------------------------------
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
> Issue Type: Improvement
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects
> just to store the location of the files. If we can avoid this, then we can
> improve the performance, since the {{path.getFileSystem(conf)}} calls can
> become costly, especially for S3
--
This message was sent by Atlassian Jira
(v8.20.7#820007)