vikasmalhotra08 opened a new issue #11347:
URL: https://github.com/apache/arrow/issues/11347
Hello,
Is it possible to read specific nested fields when trying to read a parquet
file? I am getting an error that:
```pyarrow.lib.ArrowInvalid: Field named 'a.b' not found or not unique in
the schema.```
Here is how the file is written out:
```
# Writing as table
pq.write_table(
table,
where=file_path,
version='2.0',
compression='snappy'
)
```
Here is the schema that's present in the parquet field:
```
required group field_id=0 schema {
optional group field_id=1 a {
optional binary field_id=2 abc (String);
optional group field_id=3 b {
optional binary field_id=4 c (String);
optional binary field_id=5 d (String);
optional binary field_id=6 e (String);
}
}
}
```
Here is how I am trying to read it:
```
# read the table
columns_needed = ['a.b', 'a.b.c']
data = pq.read_table(
file_path,
columns=columns_needed)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]