ikrommyd commented on issue #49683:
URL: https://github.com/apache/arrow/issues/49683#issuecomment-4201071619
Well the way I encountered this problem is through awkward array which does
allow a 50 * 0 * float32 size like so
```py
In [1]: import awkward as ak
...: import numpy as np
...:
...: layout = ak.contents.RegularArray(
...: ak.contents.NumpyArray(np.zeros(0, dtype=np.float32)),
...: size=0,
...: zeros_length=50,
...: )
...:
...: arr = ak.Array(layout)
...: print(arr)
...: print(arr.type)
[[], [], [], [], [], [], [], [], [], ..., [], [], [], [], [], [], [], [], []]
50 * 0 * float32
```
I was trying to save this to a parquet file which awkward does through arrow
exactly in the same way my above reproducer does
```py
In [2]: ak.to_arrow(layout, extensionarray=False)
Out[2]:
<pyarrow.lib.FixedSizeListArray object at 0x109f8d6c0>
[
[],
[],
...
[],
[]
]
In [3]: ak.to_arrow_table(layout, extensionarray=False)
Out[3]:
pyarrow.Table
: fixed_size_list<item: float not null>[0] not null
child 0, item: float not null
----
: [[[],[],...,[],[]]]
In [5]: ak.to_parquet(layout, "tmp.parquet", extensionarray=False)
Out[5]:
<pyarrow._parquet.FileMetaData object at 0x10a09fb00>
created_by: parquet-cpp-arrow version 23.0.1
num_columns: 1
num_rows: 50
num_row_groups: 1
format_version: 2.6
serialized_size: 0
```
Then I couldn't read this parquet file with the error mentioned above
```
ArrowInvalid: list_size needs to be a strict positive integer
```
So I'm wondering, is there something fundamentally wrong with `list_size=0`?
If so, pyarrow should error when one tries to save it to parquet and we should
also error in awkward array. Otherwise, if it's fine, It's the reading that
should be fixed IMO.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]