Ashish Sharma created HIVE-19103:
------------------------------------

             Summary: Reading only required column in nested structure schema 
in ORC
                 Key: HIVE-19103
                 URL: https://issues.apache.org/jira/browse/HIVE-19103
             Project: Hive
          Issue Type: Improvement
            Reporter: Ashish Sharma
            Assignee: Ashish Sharma


Reading required columns only in nested structure schema

Example - 

*Current state* - 

Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
Query - select c.e.f from t where c.e.f > 10;
Current state - read entire c struct from the file and then filter because 
"hive.io.file.readcolumn.ids" is referred due to which all the children column 
are select to read from the file.
Conf -
     _hive.io.file.readcolumn.ids  = "2"
     hive.io.file.readNestedColumn.paths = "c.e.f"_

Result -       
boolean[ ] include  = [true,false,false,true,true,true,true,true]

*Expected state* -

Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
Query - select c.e.f from t where c.e.f > 10;
Expected state - instead of reading entire c struct from the file just read 
only the f column by referring the  " hive.io.file.readNestedColumn.paths".
Conf -
     _hive.io.file.readcolumn.ids  = "2"
     hive.io.file.readNestedColumn.paths = "c.e.f"_

Result -       
boolean[ ] include  = [true,false,false,true,false,true,true,false]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to