Hello,
While using parquet-cpp, I'm trying to figure out how to reliably check
which index a named/nested column is. In my example, I have a nested
column "neighbours.array" but may also add at a later point some more
columns with "??.array".
Until now I used "column->descr()->name()" inside a loop over all
columns in a RowGroup to determine if the current column is the one I
want to read. This works fine for "top-level" columns but for
neighbours.array, this only returns "array", the name of the primitive
node in the schema description.
To solve my problem:
1. Do we already have a reliable solution to determine which column
index "neighbours.array" is?
2. We could add a fullname (or differently named) function to the
column description.
3. We could have a map on Reader or RowGroup level that maps expanded
name to index.
If there is no solution yet, I'd be happy to implement 2 or 3 (or an
alternative approach).
My schema is as follows (generated via ParquetAvroWriter):
required group com.xhochy.AdjacencyArray {
required int32 id
required int32 degree
required group neighbours {
repeated int32 array
}
}
Greetings,
Uwe