adlerpriit commented on issue #39912:
URL: https://github.com/apache/arrow/issues/39912#issuecomment-2909927864
Similar issue, but in Python with pyarrow 20.0.0. Works fine with 30M less
rows.
```
>>> mydf.shape
(338440930, 3)
>>> mydf.dtypes
sample category
peptide category
N Int32
dtype: object
>>> mydf.to_parquet('pep_all_pandas.parquet', index=False,
row_group_size=8192*8192)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/util/_decorators.py",
line 333, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/core/frame.py",
line 3113, in to_parquet
return to_parquet(
^^^^^^^^^^^
File
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py",
line 480, in to_parquet
impl.write(
File
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py",
line 228, in write
self.api.parquet.write_table(
File
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py",
line 1909, in write_table
writer.write_table(table, row_group_size=row_group_size)
File
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py",
line 1115, in write_table
self.writer.write_table(table, row_group_size=row_group_size)
File "pyarrow/_parquet.pyx", line 2226, in
pyarrow._parquet.ParquetWriter.write_table
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Negative buffer resize: -2008352576
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]