kshitij12345 opened a new pull request, #13234:
URL: https://github.com/apache/arrow/pull/13234

   Ref Code:
   
   <details>
   
   ```
   from io import BytesIO
   
   import pyarrow as pa
   import pyarrow.parquet as pq
   from contexttimer import Timer  # non standard lib (can be installed with 
pip)
   
   
   def create_example_file_meta_data():
       data = {
           "str": pa.array(["a", "b", "c", "d"], type=pa.string()),
           "uint8": pa.array([1, 2, 3, 4], type=pa.uint8()),
           "int32": pa.array([0, -2147483638, 2147483637, 1], type=pa.int32()),
           "bool": pa.array([True, True, False, False], type=pa.bool_()),
       }
       table = pa.table(data)
       metadata_collector = []
       pq.write_table(table, BytesIO(), metadata_collector=metadata_collector)
       return table.schema, metadata_collector[0]
   
   schema, meta = create_example_file_meta_data()
   print("Created Example File")
   metadata_collector = [meta] * 500
   with Timer(prefix='1'):
       pq.write_metadata(schema, BytesIO(), 
metadata_collector=metadata_collector)
   
   metadata_collector = [meta] * 1000
   with Timer(prefix='2'):
       pq.write_metadata(schema, BytesIO(), 
metadata_collector=metadata_collector)
   
   metadata_collector = [meta] * 2000
   with Timer(prefix='3'):
       pq.write_metadata(schema, BytesIO(), 
metadata_collector=metadata_collector)
   
   metadata_collector = [meta] * 4000
   with Timer(prefix='4'):
       pq.write_metadata(schema, BytesIO(), 
metadata_collector=metadata_collector)
   ```
   
   </details>
   
   Before
   ```
   Created Example File
   1 took 0.615 seconds
   2 took 2.446 seconds
   3 took 9.813 seconds
   4 took 40.237 seconds
   ```
   
   After
   ```
   Created Example File
   1 took 0.009 seconds
   2 took 0.018 seconds
   3 took 0.036 seconds
   4 took 0.072 seconds
   ```
   
   TODO:
   * [ ] To actually overload Cython function than adding a new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to