pvary opened a new issue #2120: URL: https://github.com/apache/iceberg/issues/2120
Current Hive reads and writes are working on the non-vectorized path. The vectorization could disabled by different rules but if we force it to be enabled we will have failures. The current way of working: - Read records by `HiveIcebergSerDe.deserialize()` and return a `Record` object where the schema contains only the projected columns - Write records by `HiveIcebergSerDe.serialize()` and return a `Record` object where the schema is the schema of the target table Vectorized code path expects: - Read path: List of Objects where the list contains every column of the source table schema (the non-projected columns can/should be null) - Write path: List of Objects where the list contains every column of the target table schema Maybe we should make it possible to create different Iceberg readers/writers for vectorized and non-vectorized code paths. The decision could be made based on the Hive `Utilities.getIsVectorized(conf)` like [this](https://github.com/apache/hive/blob/a97448f84167e4e8c3615908556fe2e4163a43ca/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3893-L3918) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
