[jira] [Updated] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

Jira Thu, 09 Sep 2021 02:47:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ádám Szita updated HIVE-25453:
------------------------------
    Description: 
Adding support for reading Iceberg ORC tables via LLAP..

The easy part is swapping out the plain simple VectorizedOrcRecordReader to 
LlapRecordReader.
The hard part is maintaining correctness even after a series of schema changes 
that are normally allowed to Iceberg/ORC, but were not for simple ORC or 
therefore for LLAP. To make it all work, LLAP had to be made to support a 
broader schema evolution.

Before this change LLAP made the simple assumption that the reader and file 
schemas match all columns, now separate physical and logical read schemas and 
corresponding include lists are used instead. Also added 
logicalOrderedColumnIds here, which holds indices from the reader schema, but 
in file schema order - a necessary tool for mapping the results produced by 
LLAP, as LLAP always reads columns in the order as they are written out in the 
file.

Also added a new CLI driver class for testing the cached reads from Iceberg/ORC 
tables via LLAP.

> Add LLAP IO support for Iceberg ORC tables
> ------------------------------------------
>
>                 Key: HIVE-25453
>                 URL: https://issues.apache.org/jira/browse/HIVE-25453
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Adding support for reading Iceberg ORC tables via LLAP..
> The easy part is swapping out the plain simple VectorizedOrcRecordReader to 
> LlapRecordReader.
> The hard part is maintaining correctness even after a series of schema 
> changes that are normally allowed to Iceberg/ORC, but were not for simple ORC 
> or therefore for LLAP. To make it all work, LLAP had to be made to support a 
> broader schema evolution.
> Before this change LLAP made the simple assumption that the reader and file 
> schemas match all columns, now separate physical and logical read schemas and 
> corresponding include lists are used instead. Also added 
> logicalOrderedColumnIds here, which holds indices from the reader schema, but 
> in file schema order - a necessary tool for mapping the results produced by 
> LLAP, as LLAP always reads columns in the order as they are written out in 
> the file.
> Also added a new CLI driver class for testing the cached reads from 
> Iceberg/ORC tables via LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

Reply via email to