rizaon opened a new pull request #990: URL: https://github.com/apache/orc/pull/990
### What changes were proposed in this pull request? ORC C++ library doesn't have a type id for the index field of a list type. We have to select the type id of the whole array if we only want to get the list indices, which causes unnecessary materialization on the array elements. The offset stream is stored separately from the content stream. We can materialize the list indices only. This patch add the fourth option in ORC C++ library to select column from ORC file, namely RowReaderOptions::includeTypesAndIntents. It similar as RowReaderOptions::includeTypes, but with additional set of ReadIntent for each type id. ListColumnReader can then refer to this ReadIntent set to either read the list elements, read indices, or both. ReadIntent_DATA is the default for all type id if the selection does not specify any ReadIntent. Adding read intent avoid introducing fake type id only to refer to the list indices. Thus, expected type ids for an ORC file stay the same after this patch. ### Why are the changes needed? This is needed to selectively avoid materialization of array elements. ### How was this patch tested? Declare ReadIntent_POS in TestColumnReaderEncoded.testList and verify that the resulting indices are correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org