taiyang-li opened a new pull request, #49183:
URL: https://github.com/apache/arrow/pull/49183

   ## Summary
   
   Implement streaming Snappy compressor/decompressor for Arrow C++ using the 
official Snappy framing format, including per-chunk masked CRC-32C 
verification, and enable the existing streaming tests for Snappy.
   
   ## Details
   
   - Add a small `crc32c_masked` helper in `arrow::util` to compute the masked 
CRC-32C checksum as defined by the Snappy framing specification.
   - Extend the C++ util build to compile `crc32c.cc` and link it into the main 
util library.
   - Reimplement the Snappy codec streaming layer in `compression_snappy.cc`:
     - Keep one-shot `Codec::Compress/Decompress` based on raw Snappy 
bitstreams (RawCompress/RawUncompress).
     - Implement `SnappyFramedCompressor` that emits the official stream 
identifier chunk and split the uncompressed stream into 64 KiB chunks, each 
wrapped as a framed chunk with a per-chunk masked CRC-32C checksum.
     - Implement `SnappyFramedDecompressor` as a stateful parser for Snappy 
framed streams that validates the stream identifier, handles 
compressed/uncompressed/skippable chunks, verifies the masked CRC-32C of the 
uncompressed payload, and supports incremental output via the `Decompress` API.
   - Wire `Codec::MakeCompressor` / `Codec::MakeDecompressor` for 
`Compression::SNAPPY` to the new framed implementations.
   - Generalize the streaming compression/decompression tests in 
`compression_test.cc` so that they:
     - Validate streaming compressor output using the streaming decompressor 
instead of the one-shot codec, aligning with codecs where streaming and 
one-shot formats differ.
     - Generate inputs for `CheckStreamingDecompressor` using the streaming 
compressor rather than one-shot compression.
     - Remove the Snappy-specific skips in `StreamingCompressor`, 
`StreamingDecompressor`, `StreamingRoundtrip`, `StreamingDecompressorReuse`, 
and `StreamingMultiFlush`, so streaming tests now cover Snappy as well as the 
existing codecs.
   
   ## Testing
   
   Due to the environment lacking a configured C/C++ toolchain and Ninja, a 
local CMake/Ninja build with `ARROW_WITH_SNAPPY=ON` and `ARROW_BUILD_TESTS=ON` 
could not be completed in this sandbox. The changes are limited to the C++ util 
layer and its unit tests; they should be validated by running the standard C++ 
test suite (in particular `util-compression-test`) in a fully provisioned Arrow 
development environment.
   
   
   Change-Id: I97c877d81959c13578c6f251cb6c8a8141297d6a
   
   Thanks for opening a pull request!
   
   If this is your first pull request you can find detailed information on how 
to contribute here:
   
     * [New Contributor's 
Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing 
Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
     * [AI-generated Code 
Guidance](https://arrow.apache.org/docs/dev/developers/overview.html#ai-generated-code)
   
   Please remove this line and the above text before creating your pull request.
   
   ### Rationale for this change
   
   ### What changes are included in this PR?
   
   ### Are these changes tested?
   
   ### Are there any user-facing changes?
   
   **This PR includes breaking changes to public APIs.** (If there are any 
breaking changes to public APIs, please explain which changes are breaking. If 
not, you can remove this.)
   
   **This PR contains a "Critical Fix".** (If the changes fix either (a) a 
security vulnerability, (b) a bug that caused incorrect or invalid data to be 
produced, or (c) a bug that causes a crash (even when the API contract is 
upheld), please provide explanation. If not, you can remove this.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to