Aliaksei Sandryhaila created ORC-28:
---------------------------------------
Summary: Reading a subset of complex-type columns does not select
the right columns
Key: ORC-28
URL: https://issues.apache.org/jira/browse/ORC-28
Project: Orc
Issue Type: Bug
Reporter: Aliaksei Sandryhaila
Selected columns are set through ReaderOptions.include() and correspond to the
top-level columns in an ORC file. ReaderImpl constructor uses this info to
determine which physical columns to read from the file. The current
implementation does not do this correctly.
Reproducer:
examples/TestOrcFile.testSeek.orc contains 12 top-level columns:
1: boolean
2-4: int
5-6: double
8: binary
9:string
10: struct<array<struct<int,string>>>
11: array<struct<int,string>>
12: map<string,struct<int,string>>
The physical layout in the file is:
1: boolean
2-4: int
5-6: double
8: binary
9:string
10: struct
11: array
12: struct
13: int
14: string
15: array
16: struct
17: int
18: string
19: map
20: string
21: struct
22: int
23: string
Trying to read column 11, which is array<struct<int,string>>, ReaderImpl
actually reads column 10, because it treats 11 as the index of the physical
column, and physical column 11 is a subcolumn of column 10.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)