alippai commented on issue #34510:
URL: https://github.com/apache/arrow/issues/34510#issuecomment-1462753928
The same happens with not null values (I'm not sure how to define the not
null list correctly, but looks like it doesn't matter):
```python
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
arr_random = np.random.default_rng().standard_normal(size=[8000000],
dtype='float64')
arr1 = pa.array(arr_random)
arr2 = pa.FixedSizeListArray.from_arrays(arr_random, 80)
t1 = pa.Table.from_arrays([arr1], schema=pa.schema([('A', pa.float64(),
False)]))
t2 = pa.Table.from_arrays([arr2], schema=pa.schema([('A',
pa.list_(pa.field('A', pa.float64(), False), 80), False)]))
t3 = pa.Table.from_arrays([arr2], schema=pa.schema([pa.field('A',
pa.list_(pa.float64(), 80), False)]))
pq.write_table(t1, 't1.parquet')
pq.write_table(t2, 't2.parquet')
pq.write_table(t3, 't3.parquet')
```
`%%timeit`
```python
t1 = pq.read_table('t1.parquet') # 30ms
```
`%%timeit`
```python
t2 = pq.read_table('t2.parquet') # 100ms
```
`%%timeit`
```python
t3 = pq.read_table('t3.parquet') # 100ms
```
```python
print(t1.get_total_buffer_size(), t2.get_total_buffer_size(),
t3.get_total_buffer_size()) # (64000000, 64000000, 64000000)
print(t1.schema, t2.schema, t3.schema)
# (A: double not null,
# A: fixed_size_list<A: double not null>[80] not null
# child 0, A: double not null,
# A: fixed_size_list<item: double>[80] not null
# child 0, item: double)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]