Apologies for the very delayed reply. Does unframed LZ4 provide a checksum of the content before compression?
I don't believe so, we would have need to add basic minimal metadata like checksum/uncompressed length. I think this is still fairly simple compared to implementing the block format. On Sat, Aug 31, 2024 at 12:11 PM Piotr Findeisen <piotr.findei...@gmail.com> wrote: > Hi Micah, > > Good point. > Does unframed LZ4 provide a checksum of the content before compression? > > Best > Piotr > > > On Fri, 30 Aug 2024 at 23:34, Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> The Iceberg implementation was supposed to be based on aircompressor pure >>> Java implementation https://github.com/airlift/aircompressor/pull/142. >>> AFAICT, aircompressor started to favor (or be more OK with) native >>> implementations (because of Project Panama), so adding LZ4 framed >>> compression might be simpler these days. >> >> >> Since this work was never completed, I'd personally be in favor of >> deprecating LZ4 framed and using LZ4 withing framing which already has high >> quality native java implementation. >> >> Cheers, >> Micah >> >> On Tue, Aug 27, 2024 at 5:44 AM Piotr Findeisen < >> piotr.findei...@gmail.com> wrote: >> >>> Hi Gabor >>> >>> Thanks for creating this discussion thread. This is indeed a good topic >>> to discuss. >>> >>> The idea was to have lightweight compression for the footer for cass >>> when Puffin files are bigger. >>> It is true that the implementation didn't follow the spec yet. >>> If we remove this from the Puffin spec, we will probably want to add it >>> later. >>> >>> The Iceberg implementation was supposed to be based on >>> aircompressor pure Java implementation >>> https://github.com/airlift/aircompressor/pull/142. >>> AFAICT, aircompressor started to favor (or be more OK with) native >>> implementations (because of Project Panama), so adding LZ4 framed >>> compression might be simpler these days. >>> >>> I would prefer to spend the effort on completing the compression. >>> >>> Best >>> Piotr >>> >>> >>> >>> >>> On Tue, 27 Aug 2024 at 14:29, Gabor Kaszab >>> <gaborkas...@cloudera.com.invalid> wrote: >>> >>>> Hi Iceberg Community, >>>> >>>> I saw in the Puffin spec <https://iceberg.apache.org/puffin-spec> that >>>> the footer of the Puffin file or the blobs themselves could be compressed >>>> by LZ4. I checked the code >>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/puffin/PuffinFormat.java#L110> >>>> however, and for me it seems that currently LZ4 is not supported. >>>> My first question is do I miss anything here? >>>> The second, is if we in fact don't support LZ4, can I remove it from >>>> the spec to avoid confusions? (I believe this requires a vote in a separate >>>> thread) >>>> >>>> Thanks, >>>> Gabor >>>> >>>>