[GitHub] [iceberg] pvary opened a new issue #2120: Hive: Vectorization is not working

GitBox Wed, 20 Jan 2021 02:27:18 -0800


pvary opened a new issue #2120:
URL: https://github.com/apache/iceberg/issues/2120



   Current Hive reads and writes are working on the non-vectorized path. The 
vectorization could disabled by different rules but if we force it to be 
enabled we will have failures.
   
   The current way of working:
   - Read records by `HiveIcebergSerDe.deserialize()` and return a `Record` 
object where the schema contains only the projected columns
   - Write records by `HiveIcebergSerDe.serialize()` and return a `Record` 
object where the schema is the schema of the target table
   
   Vectorized code path expects:
   - Read path: List of Objects where the list contains every column of the 
source table schema (the non-projected columns can/should be null)
   - Write path: List of Objects where the list contains every column of the 
target table schema
   
   Maybe we should make it possible to create different Iceberg readers/writers 
for vectorized and non-vectorized code paths. The decision could be made based 
on the Hive `Utilities.getIsVectorized(conf)` like 
[this](https://github.com/apache/hive/blob/a97448f84167e4e8c3615908556fe2e4163a43ca/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3893-L3918)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary opened a new issue #2120: Hive: Vectorization is not working

Reply via email to