[
https://issues.apache.org/jira/browse/HIVE-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ádám Szita updated HIVE-25453:
------------------------------
Description:
Adding support for reading Iceberg ORC tables via LLAP..
The easy part is swapping out the plain simple VectorizedOrcRecordReader to
LlapRecordReader.
The hard part is maintaining correctness even after a series of schema changes
that are normally allowed to Iceberg/ORC, but were not for simple ORC or
therefore for LLAP. To make it all work, LLAP had to be made to support a
broader schema evolution.
Before this change LLAP made the simple assumption that the reader and file
schemas match all columns, now separate physical and logical read schemas and
corresponding include lists are used instead. Also added
logicalOrderedColumnIds here, which holds indices from the reader schema, but
in file schema order - a necessary tool for mapping the results produced by
LLAP, as LLAP always reads columns in the order as they are written out in the
file.
Also added a new CLI driver class for testing the cached reads from Iceberg/ORC
tables via LLAP.
> Add LLAP IO support for Iceberg ORC tables
> ------------------------------------------
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
> Issue Type: New Feature
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> Adding support for reading Iceberg ORC tables via LLAP..
> The easy part is swapping out the plain simple VectorizedOrcRecordReader to
> LlapRecordReader.
> The hard part is maintaining correctness even after a series of schema
> changes that are normally allowed to Iceberg/ORC, but were not for simple ORC
> or therefore for LLAP. To make it all work, LLAP had to be made to support a
> broader schema evolution.
> Before this change LLAP made the simple assumption that the reader and file
> schemas match all columns, now separate physical and logical read schemas and
> corresponding include lists are used instead. Also added
> logicalOrderedColumnIds here, which holds indices from the reader schema, but
> in file schema order - a necessary tool for mapping the results produced by
> LLAP, as LLAP always reads columns in the order as they are written out in
> the file.
> Also added a new CLI driver class for testing the cached reads from
> Iceberg/ORC tables via LLAP.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)