zeroshade commented on pull request #9862: URL: https://github.com/apache/arrow/pull/9862#issuecomment-813062545
Rebased as per suggestion to hopefully get a successful run there. As for the requested changes: 1) Updated the `MaxCompressLen` function for ZSTD to explicitly handle signed vs unsigned to ensure no issues there. 2) Is there a test case that exists in the integration archery tests which has the metadata say to use compression but passes uncompressed buffers? The code currently correctly handles 0 length and nil buffers, but I didn't see anything in the spec or config (or the C++ implementation) for having non-zero length buffers which are uncompressed being sent along side compressed ones. I guess I'm missing something? As far as the decoupling of compression/decompression and IO calls: For Decompression: The current code structure only decompresses as it loads the records but doesn't provide any decoupling between reading the record in with the IO calls and the loading of the flatbuffer data that I could put the decompression into. Currently it's just going to decompress in the same place that it would be loading the raw flatbuffer data into an `array.Record`. For Compression: The compression happens during the payload encoding. So while the current code does not provide any hook that could be used to decouple encoding the payload and the IO to send it (the encoding happens right before it writes the record) if that decoupling were to be added, it is easy to do it as the compression is part of the payload encoding, not part of the IO to write the record out. The one optimization I can see here is that I could implement parallelization for the compression to attempt to compress the body buffers in parallel rather than serially like I'm doing right now. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org