hi Uwe, Thanks for bringing this up -- I haven't done any work with nested data yet so I didn't add any helper functions like you're describing yet!
I had several thoughts about this work working on the schema tree building code in parquet/schema. One solution is that you can add a parent() field returning const Node* so that can walk the tree from leaf to root and construct the full path. You could also put some maps at the level of the SchemaDescriptor (for example node_id -> column index for leaves). As you probably already saw, currently there is one ColumnDescriptor created for each leaf node and these are the column indexes that are used in SchemaDescriptor::Column: https://github.com/apache/parquet-cpp/blob/master/src/parquet/schema/descriptor.cc#L66 Happy to look at patches or understand the use case a bit better. I hadn't planned to do more work on this personally until I got to reading and writing nested Arrow data to Parquet. - Wes On Wed, Mar 16, 2016 at 10:01 AM, Uwe Korn <[email protected]> wrote: > Hello, > > While using parquet-cpp, I'm trying to figure out how to reliably check > which index a named/nested column is. In my example, I have a nested column > "neighbours.array" but may also add at a later point some more columns with > "??.array". > > Until now I used "column->descr()->name()" inside a loop over all columns in > a RowGroup to determine if the current column is the one I want to read. > This works fine for "top-level" columns but for neighbours.array, this only > returns "array", the name of the primitive node in the schema description. > > To solve my problem: > > 1. Do we already have a reliable solution to determine which column > index "neighbours.array" is? > 2. We could add a fullname (or differently named) function to the > column description. > 3. We could have a map on Reader or RowGroup level that maps expanded > name to index. > > If there is no solution yet, I'd be happy to implement 2 or 3 (or an > alternative approach). > > My schema is as follows (generated via ParquetAvroWriter): > > required group com.xhochy.AdjacencyArray { > required int32 id > required int32 degree > required group neighbours { > repeated int32 array > } > } > > Greetings, > Uwe
