adlerpriit commented on issue #39912: URL: https://github.com/apache/arrow/issues/39912#issuecomment-2909927864
Similar issue, but in Python with pyarrow 20.0.0. Works fine with 30M less rows. ``` >>> mydf.shape (338440930, 3) >>> mydf.dtypes sample category peptide category N Int32 dtype: object >>> mydf.to_parquet('pep_all_pandas.parquet', index=False, row_group_size=8192*8192) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/util/_decorators.py", line 333, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 3113, in to_parquet return to_parquet( ^^^^^^^^^^^ File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 480, in to_parquet impl.write( File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 228, in write self.api.parquet.write_table( File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1909, in write_table writer.write_table(table, row_group_size=row_group_size) File "/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1115, in write_table self.writer.write_table(table, row_group_size=row_group_size) File "pyarrow/_parquet.pyx", line 2226, in pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Negative buffer resize: -2008352576 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org