taiyang-li opened a new pull request, #49183:
URL: https://github.com/apache/arrow/pull/49183
## Summary
Implement streaming Snappy compressor/decompressor for Arrow C++ using the
official Snappy framing format, including per-chunk masked CRC-32C
verification, and enable the existing streaming tests for Snappy.
## Details
- Add a small `crc32c_masked` helper in `arrow::util` to compute the masked
CRC-32C checksum as defined by the Snappy framing specification.
- Extend the C++ util build to compile `crc32c.cc` and link it into the main
util library.
- Reimplement the Snappy codec streaming layer in `compression_snappy.cc`:
- Keep one-shot `Codec::Compress/Decompress` based on raw Snappy
bitstreams (RawCompress/RawUncompress).
- Implement `SnappyFramedCompressor` that emits the official stream
identifier chunk and split the uncompressed stream into 64 KiB chunks, each
wrapped as a framed chunk with a per-chunk masked CRC-32C checksum.
- Implement `SnappyFramedDecompressor` as a stateful parser for Snappy
framed streams that validates the stream identifier, handles
compressed/uncompressed/skippable chunks, verifies the masked CRC-32C of the
uncompressed payload, and supports incremental output via the `Decompress` API.
- Wire `Codec::MakeCompressor` / `Codec::MakeDecompressor` for
`Compression::SNAPPY` to the new framed implementations.
- Generalize the streaming compression/decompression tests in
`compression_test.cc` so that they:
- Validate streaming compressor output using the streaming decompressor
instead of the one-shot codec, aligning with codecs where streaming and
one-shot formats differ.
- Generate inputs for `CheckStreamingDecompressor` using the streaming
compressor rather than one-shot compression.
- Remove the Snappy-specific skips in `StreamingCompressor`,
`StreamingDecompressor`, `StreamingRoundtrip`, `StreamingDecompressorReuse`,
and `StreamingMultiFlush`, so streaming tests now cover Snappy as well as the
existing codecs.
## Testing
Due to the environment lacking a configured C/C++ toolchain and Ninja, a
local CMake/Ninja build with `ARROW_WITH_SNAPPY=ON` and `ARROW_BUILD_TESTS=ON`
could not be completed in this sandbox. The changes are limited to the C++ util
layer and its unit tests; they should be validated by running the standard C++
test suite (in particular `util-compression-test`) in a fully provisioned Arrow
development environment.
Change-Id: I97c877d81959c13578c6f251cb6c8a8141297d6a
Thanks for opening a pull request!
If this is your first pull request you can find detailed information on how
to contribute here:
* [New Contributor's
Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
* [Contributing
Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
* [AI-generated Code
Guidance](https://arrow.apache.org/docs/dev/developers/overview.html#ai-generated-code)
Please remove this line and the above text before creating your pull request.
### Rationale for this change
### What changes are included in this PR?
### Are these changes tested?
### Are there any user-facing changes?
**This PR includes breaking changes to public APIs.** (If there are any
breaking changes to public APIs, please explain which changes are breaking. If
not, you can remove this.)
**This PR contains a "Critical Fix".** (If the changes fix either (a) a
security vulnerability, (b) a bug that caused incorrect or invalid data to be
produced, or (c) a bug that causes a crash (even when the API contract is
upheld), please provide explanation. If not, you can remove this.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]