[
https://issues.apache.org/jira/browse/HIVE-26307?focusedWorklogId=780822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780822
]
ASF GitHub Bot logged work on HIVE-26307:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jun/22 13:37
Start Date: 13/Jun/22 13:37
Worklog Time Spent: 10m
Work Description: szlta commented on code in PR #3354:
URL: https://github.com/apache/hive/pull/3354#discussion_r895723865
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -381,100 +400,67 @@ private CloseableIterable<T> newAvroIterable(
Avro.ReadBuilder avroReadBuilder = Avro.read(inputFile)
.project(readSchema)
.split(task.start(), task.length());
+
Review Comment:
nit: Should we keep the original exception ("Vectorized execution is not yet
supported for Iceberg avro...") here in case the inMemoryDataModel==HIVE ?
In theory HiveIcebergStorageHandler should prevent such combination, just
wanted to know if this was a conscious decision.
Issue Time Tracking
-------------------
Worklog Id: (was: 780822)
Time Spent: 20m (was: 10m)
> Avoid FS init in FileIO::newInputFile in vectorized Iceberg reads
> -----------------------------------------------------------------
>
> Key: HIVE-26307
> URL: https://issues.apache.org/jira/browse/HIVE-26307
> Project: Hive
> Issue Type: Improvement
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> With vectorized Iceberg reads we are creating {{HadoopInputFile}} objects
> just to store the location of the files. If we can avoid this, then we can
> improve the performance, since the {{path.getFileSystem(conf)}} calls can
> become costly, especially for S3
--
This message was sent by Atlassian Jira
(v8.20.7#820007)