alippai commented on issue #34510:
URL: https://github.com/apache/arrow/issues/34510#issuecomment-1462753928

   The same happens with not null values (I'm not sure how to define the not 
null list correctly, but looks like it doesn't matter):
   ```python
   import numpy as np
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   arr_random = np.random.default_rng().standard_normal(size=[8000000], 
dtype='float64')
   arr1 = pa.array(arr_random)
   arr2 = pa.FixedSizeListArray.from_arrays(arr_random, 80)
   t1 = pa.Table.from_arrays([arr1], schema=pa.schema([('A', pa.float64(), 
False)]))
   t2 = pa.Table.from_arrays([arr2], schema=pa.schema([('A', 
pa.list_(pa.field('A', pa.float64(), False), 80), False)]))
   t3 = pa.Table.from_arrays([arr2], schema=pa.schema([pa.field('A', 
pa.list_(pa.float64(), 80), False)]))
   
   pq.write_table(t1, 't1.parquet')
   pq.write_table(t2, 't2.parquet')
   pq.write_table(t3, 't3.parquet')
   ```
   `%%timeit`
   ```python
   t1 = pq.read_table('t1.parquet') # 30ms
   ```
   `%%timeit`
   ```python
   t2 = pq.read_table('t2.parquet') # 100ms
   ```
   `%%timeit`
   ```python
   t3 = pq.read_table('t3.parquet') # 100ms
   ```
   ```python
   print(t1.get_total_buffer_size(), t2.get_total_buffer_size(), 
t3.get_total_buffer_size()) # (64000000, 64000000, 64000000)
   print(t1.schema, t2.schema, t3.schema)
   # (A: double not null,
   # A: fixed_size_list<A: double not null>[80] not null
   #   child 0, A: double not null,
   # A: fixed_size_list<item: double>[80] not null
   #   child 0, item: double)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to