[GitHub] [arrow] joosthooz opened a new pull request, #14400: Added compression support to pyarrow.dataset.ipc format.

GitBox Thu, 13 Oct 2022 04:56:28 -0700


joosthooz opened a new pull request, #14400:
URL: https://github.com/apache/arrow/pull/14400


   The normal pyarrow feather writer supports setting a small set of properties 
when writing: compression, compression_level and chunk_size. These are stored 
in a struct, `ipc::feather::WriteProperties`. In this PR I've used this same 
struct to add the compression options to the dataset ipc writer.
   Note that this is different from the struct `ipc::IpcWriteOptions` and also 
from `dataset::IpcFileWriteOptions`. When creating the writer, 
`WriteProperties` is used to overwrite the default options in 
`ipc::IpcWriteOptions` for compression, the same way as happens in 
`ipc::feather::WriteTable`.
   
   The alternative is to mimic the way it works for CSV, and expose 
`ipc::IpcWriteOptions` in python. But the the dataset ipc writer would have 
more functionality. Also there's a bunch of property in it that I think don't 
make sense to use in python. Lastly, the compression codec in that struct needs 
to be initialized by calling some C++ code that must be run when setting the 
compression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] joosthooz opened a new pull request, #14400: Added compression support to pyarrow.dataset.ipc format.

Reply via email to