nwoolmer opened a new pull request, #49135:
URL: https://github.com/apache/arrow/pull/49135

   Closes https://github.com/apache/arrow/issues/41863
   
   ### Rationale for this change
   
   Other tools in the parquet ecosystem distinguish between `LZ4` and 
`LZ4_RAW`, matching the specification: 
https://parquet.apache.org/docs/file-format/data-pages/compression/
   
   `LZ4` (framing) is of course deprecated. PyArrow does not support it, and 
instead simplifies the user-facing API, using `LZ4` as an alias for the 
`LZ4_RAW` codec. 
   
   However, PyArrow does not accept `LZ4_RAW` as a valid alias for the 
`LZ4_RAW` codec:
   
   ```
   ArrowException: Unsupported compression: lz4_raw
   ```
   
   This is a friction issue, and confusing for some users who are aware of the 
differences.
   
   ### What changes are included in this PR?
   
   - Adding `LZ4_RAW` to the acceptable codec names list.
   - Modifying the `LZ4->LZ4_RAW` mapping to also accept `LZ4_RAW->LZ4_RAW`.
   - Adding a test
   
   ### Are these changes tested?
   
   Yes.
   
   ### Are there any user-facing changes?
   
   Yes, an additive change to the accepted codec names.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to