[ 
https://issues.apache.org/jira/browse/ORC-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444533#comment-15444533
 ] 

Chunyang Wen commented on ORC-97:
---------------------------------

In Parquet, there is a class called ColumnPath which represented nested column 
as dot separated string.

I plan to first build a map from column path( dot separated strings) to its id 
so that users can specify dot separated columns like (a.b.c).

When receiving include name request from ReaderOptions, we can just turn it 
into its type id and then call includeTypes from ReaderOptions (omalley has 
commited includeTypes).

The cost is an in-memory data structure of a map, but it simplify the 
implementation of include_name for nested column names. By the way, we do not 
need to add any public API and it is compatible.

For struct type: it is easy to understand (we just add a dot separated field  
to the column path). For other non-primitive type like map, union, list, we 
have to make it clear that how to specify them.

struct <m:map<string, primitive_type>>
struct <m:map<string, non_primitive_type>>


> Support column name selection in ReaderOptions
> ----------------------------------------------
>
>                 Key: ORC-97
>                 URL: https://issues.apache.org/jira/browse/ORC-97
>             Project: Orc
>          Issue Type: New Feature
>          Components: C++
>    Affects Versions: 1.2.0
>            Reporter: Chunyang Wen
>            Assignee: Chunyang Wen
>
> After orc-92 patch, column id selection is supported. But actually select 
> sub-type by name is more useful.
> In my project, we use period(.) to separate nested field names.
> <s1:struct<s2:struct<int1: int>>>
> we choose int1 by s1.s2.int1 which will be passed 
> include(std::list<std:string>).
> In my implementation: first I build a map for name and column id, and then   
> direct the function call to includeTypes. If this is a candidate solution, I 
> will provide a patch for review soon.
> When a sub-type is selected, all his child types should be selected also, as 
> O'Malley pointed out in orc-92.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to