hi Uwe,

Thanks for bringing this up -- I haven't done any work with nested
data yet so I didn't add any helper functions like you're describing
yet!

I had several thoughts about this work working on the schema tree
building code in parquet/schema. One solution is that you can add a
parent() field returning const Node* so that can walk the tree from
leaf to root and construct the full path. You could also put some maps
at the level of the SchemaDescriptor (for example node_id -> column
index for leaves).

As you probably already saw, currently there is one ColumnDescriptor
created for each leaf node and these are the column indexes that are
used in SchemaDescriptor::Column:

https://github.com/apache/parquet-cpp/blob/master/src/parquet/schema/descriptor.cc#L66

Happy to look at patches or understand the use case a bit better. I
hadn't planned to do more work on this personally until I got to
reading and writing nested Arrow data to Parquet.

- Wes

On Wed, Mar 16, 2016 at 10:01 AM, Uwe Korn <[email protected]> wrote:
> Hello,
>
> While using parquet-cpp, I'm trying to figure out how to reliably check
> which index a named/nested column is. In my example, I have a nested column
> "neighbours.array" but may also add at a later point some more columns with
> "??.array".
>
> Until now I used "column->descr()->name()" inside a loop over all columns in
> a RowGroup to determine if the current column is the one I want to read.
> This works fine for "top-level" columns but for neighbours.array, this only
> returns "array", the name of the primitive node in the schema description.
>
> To solve my problem:
>
> 1. Do we already have a reliable solution to determine which column
>    index "neighbours.array" is?
> 2. We could add a fullname (or differently named) function to the
>    column description.
> 3. We could have a map on Reader or RowGroup level that maps expanded
>    name to index.
>
> If there is no solution yet, I'd be happy to implement 2 or 3 (or an
> alternative approach).
>
> My schema is as follows (generated via ParquetAvroWriter):
>
>    required group com.xhochy.AdjacencyArray {
>       required int32 id
>       required int32 degree
>       required group neighbours {
>         repeated int32 array
>       }
>    }
>
> Greetings,
> Uwe

Reply via email to