[GitHub] [arrow] vikasmalhotra08 opened a new issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

GitBox Wed, 06 Oct 2021 13:34:50 -0700


vikasmalhotra08 opened a new issue #11347:
URL: https://github.com/apache/arrow/issues/11347



   Hello,
   
   Is it possible to read specific nested fields when trying to read a parquet 
file? I am getting an error that:
   ```pyarrow.lib.ArrowInvalid: Field named 'a.b' not found or not unique in 
the schema.```
   
   Here is how the file is written out:
   ```
   # Writing as table
   pq.write_table(
       table, 
       where=file_path, 
       version='2.0', 
       compression='snappy'
   )
   ```
   
   Here is the schema that's present in the parquet field:
   ```
   required group field_id=0 schema {
   optional group field_id=1 a {
       optional binary field_id=2 abc (String);
       optional group field_id=3 b {
         optional binary field_id=4 c (String);
         optional binary field_id=5 d (String);
         optional binary field_id=6 e (String);
       }
   }
   }
   ```
   
   Here is how I am trying to read it:
   ```
   # read the table
   columns_needed = ['a.b', 'a.b.c']
   data = pq.read_table(
       file_path, 
       columns=columns_needed)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vikasmalhotra08 opened a new issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

Reply via email to