hofst opened a new issue, #40324:
URL: https://github.com/apache/arrow/issues/40324
### Describe the bug, including details regarding any error messages,
version, and platform.
I have a column where each element is an array of floats (vector
embeddings). Those floats are never null. I need to persist this column in this
exact format but pyarrow will always set the inner element_type to `optional`.
FastParquet seems to have an option `has_nulls=False` for similar purposes but
FastParquet does not seem to be able to write nested array types at all.
This is the pyarrow schema type of the column:
```
('embedding', pyarrow.list_(pyarrow.float32(), list_size=1024, False),
```
And this is the resulting parquet schema
```
required group field_id=-1 embedding (List) {
repeated group field_id=-1 list {
optional float field_id=-1 element;
}
}
```
No matter what I try, I don't find a way to set the `optional` qualifier of
the inner element to `required`. For my purposes, I need this exact schema and
it seems problematic that pyarrow cannot create it despite the pyarrow
documentation explicitly acknowledging that the inner element type may be
either `required` or `optional`.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]