kshitij12345 opened a new pull request, #13234:
URL: https://github.com/apache/arrow/pull/13234
Ref Code:
<details>
```
from io import BytesIO
import pyarrow as pa
import pyarrow.parquet as pq
from contexttimer import Timer # non standard lib (can be installed with
pip)
def create_example_file_meta_data():
data = {
"str": pa.array(["a", "b", "c", "d"], type=pa.string()),
"uint8": pa.array([1, 2, 3, 4], type=pa.uint8()),
"int32": pa.array([0, -2147483638, 2147483637, 1], type=pa.int32()),
"bool": pa.array([True, True, False, False], type=pa.bool_()),
}
table = pa.table(data)
metadata_collector = []
pq.write_table(table, BytesIO(), metadata_collector=metadata_collector)
return table.schema, metadata_collector[0]
schema, meta = create_example_file_meta_data()
print("Created Example File")
metadata_collector = [meta] * 500
with Timer(prefix='1'):
pq.write_metadata(schema, BytesIO(),
metadata_collector=metadata_collector)
metadata_collector = [meta] * 1000
with Timer(prefix='2'):
pq.write_metadata(schema, BytesIO(),
metadata_collector=metadata_collector)
metadata_collector = [meta] * 2000
with Timer(prefix='3'):
pq.write_metadata(schema, BytesIO(),
metadata_collector=metadata_collector)
metadata_collector = [meta] * 4000
with Timer(prefix='4'):
pq.write_metadata(schema, BytesIO(),
metadata_collector=metadata_collector)
```
</details>
Before
```
Created Example File
1 took 0.615 seconds
2 took 2.446 seconds
3 took 9.813 seconds
4 took 40.237 seconds
```
After
```
Created Example File
1 took 0.009 seconds
2 took 0.018 seconds
3 took 0.036 seconds
4 took 0.072 seconds
```
TODO:
* [ ] To actually overload Cython function than adding a new one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]