Hello,

While using parquet-cpp, I'm trying to figure out how to reliably check which index a named/nested column is. In my example, I have a nested column "neighbours.array" but may also add at a later point some more columns with "??.array".

Until now I used "column->descr()->name()" inside a loop over all columns in a RowGroup to determine if the current column is the one I want to read. This works fine for "top-level" columns but for neighbours.array, this only returns "array", the name of the primitive node in the schema description.

To solve my problem:

1. Do we already have a reliable solution to determine which column
   index "neighbours.array" is?
2. We could add a fullname (or differently named) function to the
   column description.
3. We could have a map on Reader or RowGroup level that maps expanded
   name to index.

If there is no solution yet, I'd be happy to implement 2 or 3 (or an alternative approach).

My schema is as follows (generated via ParquetAvroWriter):

   required group com.xhochy.AdjacencyArray {
      required int32 id
      required int32 degree
      required group neighbours {
        repeated int32 array
      }
   }

Greetings,
Uwe

Reply via email to