douglas-raillard-arm opened a new issue, #41863:
URL: https://github.com/apache/arrow/issues/41863

   ### Describe the enhancement requested
   
   `pyarrow.dataset.write_dataset(compression='lz4_raw')` currently fails with:
   
   ```
   Traceback (most recent call last):
     File "/work/projects/lisa/testpyarrow.py", line 3, in <module>
       _reencode_parquet('sched_switch.lz4.parquet', 'updated.parquet', 
compression='lz4_raw')#, row_group_size=128*1024*1024, compression='LZ4')
       
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "x.py", line 1, in my_write_parquet
       options = pyarrow.dataset.ParquetFileFormat().make_write_options(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow/_dataset_parquet.pyx", line 206, in 
pyarrow._dataset_parquet.ParquetFileFormat.make_write_options
     File "pyarrow/_dataset_parquet.pyx", line 594, in 
pyarrow._dataset_parquet.ParquetFileWriteOptions.update
     File "pyarrow/_dataset_parquet.pyx", line 599, in 
pyarrow._dataset_parquet.ParquetFileWriteOptions._set_properties
     File "pyarrow/_parquet.pyx", line 1855, in 
pyarrow._parquet._create_writer_properties
     File "pyarrow/_parquet.pyx", line 1369, in 
pyarrow._parquet.check_compression_name
   pyarrow.lib.ArrowException: Unsupported compression: lz4_raw
   ``` 
   
   And indeed, no mention of `lz4_raw` is to be found in 
`python/pyarrow/_parquet.pyx`.
   
   Would it be possible to add support for LZ4_RAW codec when writing parquet 
files, particularly using the dataset API ?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to