adlerpriit commented on issue #39912:
URL: https://github.com/apache/arrow/issues/39912#issuecomment-2909927864

   Similar issue, but in Python with pyarrow 20.0.0. Works fine with 30M less 
rows. 
   
   ```
   >>> mydf.shape
   (338440930, 3)
   >>> mydf.dtypes                                                              
      
   sample     category
   peptide    category
   N             Int32
   dtype: object
   >>> mydf.to_parquet('pep_all_pandas.parquet', index=False, 
row_group_size=8192*8192)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/util/_decorators.py",
 line 333, in wrapper
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/core/frame.py",
 line 3113, in to_parquet
       return to_parquet(
              ^^^^^^^^^^^
     File 
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py",
 line 480, in to_parquet
       impl.write(
     File 
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pandas/io/parquet.py",
 line 228, in write
       self.api.parquet.write_table(
     File 
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py",
 line 1909, in write_table
       writer.write_table(table, row_group_size=row_group_size)
     File 
"/opt/protobios/gmm-pipeline/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py",
 line 1115, in write_table
       self.writer.write_table(table, row_group_size=row_group_size)
     File "pyarrow/_parquet.pyx", line 2226, in 
pyarrow._parquet.ParquetWriter.write_table
     File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Negative buffer resize: -2008352576
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to