wombatu-kun opened a new pull request, #16348:
URL: https://github.com/apache/iceberg/pull/16348

   ## Summary
   
   Implements the LZ4 codec for Puffin, replacing the long-standing TODOs in 
PuffinFormat.compress / PuffinFormat.decompress that pointed at 
airlift/aircompressor#142.
   
   ## Motivation
   
   Puffin declared `lz4` as a valid codec (used unconditionally for footer 
compression via Puffin.write(...).compressFooter()), but the implementation 
threw UnsupportedOperationException("Unsupported codec: LZ4"). The referenced 
aircompressor PR #142 was never merged into the version Iceberg ships 
(io.airlift:aircompressor:2.0.3), which provides only raw LZ4 + Hadoop streams 
— not the standard LZ4 *frame* format the Puffin spec requires. As a result, 
footer compression was unusable and lz4 blob compression was unreachable.
   
   ## Implementation
   
   LZ4 frame support is provided by net.jpountz.lz4 (shipped as 
at.yawk.lz4:lz4-java, already pinned in this repo via a CVE resolutionStrategy 
substitution). It is promoted from a transitive-only dependency to a direct 
implementation dependency of iceberg-core.
   
   - compress: LZ4FrameOutputStream with BLOCKSIZE.SIZE_4MB, the known content 
length, and FLG.Bits.CONTENT_SIZE + FLG.Bits.BLOCK_INDEPENDENCE.
   - decompress: LZ4FrameInputStream drained via Guava ByteStreams.
   
   This conforms to the Puffin spec: "Single LZ4 compression frame, with 
content size present". Content size is encoded in the frame descriptor. 
BLOCK_INDEPENDENCE is required by lz4-java (it only supports independent 
blocks) and is orthogonal to the spec — it is also the reference lz4 CLI 
default. aircompressor is retained for ZSTD.
   
   ## Tests
   
   - TestPuffinWriter.testEmptyFooterCompressed converted from a negative test 
(asserting the UnsupportedOperationException) to a positive round-trip + 
byte-fixture test.
   - Added testWriteMetricDataCompressedLz4 / testReadMetricDataCompressedLz4 
and testValidateLz4FooterSizeValue, mirroring the existing ZSTD coverage, 
against two new committed fixtures (empty-puffin-compressed-footer.bin, 
sample-metric-data-compressed-lz4.bin).
   - Added codec-level round-trip + empty-input tests in TestPuffinFormat, 
parameterized over NONE / LZ4 / ZSTD.
   
   Verified locally: :iceberg-core:build -x integrationTest green; 
checkRuntimeDeps green for the spark-4.1 / flink-2.1 / kafka-connect bundles.
   
   ## Runtime deps & LICENSE
   
   Making lz4-java a direct dependency of iceberg-core propagates it onto the 
runtime classpath of every shaded runtime bundle that ships iceberg-core. 
Accordingly:
   
   - runtime-deps.txt baselines updated for the affected bundles (spark 
v3.4/v3.5/v4.0/v4.1, flink v1.20/v2.0/v2.1, kafka-connect-runtime). Only the 
single new at.yawk.lz4:lz4-java line was added; unrelated patch-level baseline 
drift was intentionally left out.
   - Bundle LICENSE files updated with a "This product bundles lz4-java" 
stanza, mirroring the existing Airlift Aircompressor precedent. lz4-java ships 
no NOTICE file, so NOTICE was not modified.
   
   Open item for maintainers: please sanity-check the LICENSE attribution 
wording / project URL for the at.yawk.lz4 fork against ASF policy — this is the 
documented manual step in runtime-deps.gradle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to