Going with the Registrar/ServiceLoader route would allow for alternative providers for the same compression algorithms so if they don't like one they can always contribute a different one.
On Wed, Dec 4, 2019 at 8:22 AM Ismaël Mejía <[email protected]> wrote: > (1) seems not to be the issue because it is Apache licensed. > (2) and (3) are the big issues, because it requires a provided huge uber > jar that essentially leaks Hadoop classes into core SDK [1] so it is > definitely concerning. > > We discussed at some point during the PR that added ZStandard support > about creating some sort of Registrar for compression algorithms [2] but we > decided to not go ahead because we could achieve that for the zstd case via > the optional dependencies of commons-compress. Maybe it is time to > reconsider if such mechanism is worth. For example for users that may not > care about having the hadoop leakage to be able to use LZO. > > Refs. > [1] https://mvnrepository.com/artifact/io.airlift/aircompressor/0.16 > [2] https://issues.apache.org/jira/browse/BEAM-6422 > > > > > On Tue, Dec 3, 2019 at 7:01 PM Robert Bradshaw <[email protected]> > wrote: > >> Is there a way to wrap this up as an optional dependency with multiple >> possible providers, if there's no good library satisfying all of the >> conditions (in particular (1))? >> >> On Tue, Dec 3, 2019 at 9:47 AM Luke Cwik <[email protected]> wrote: >> > >> > I was hoping that someone in the community would provide some >> alternatives since there are quite a few implementations. >> > >> > On Tue, Dec 3, 2019 at 8:20 AM Amogh Tiwari <[email protected]> wrote: >> >> >> >> Hi Luke, >> >> >> >> I agree with your thoughts and observations. But, >> airlift:aircompressor is the only implementation of LZO in pure java. That >> straight away solves #5. >> >> The other implementations that I found either have licensing issues >> (since LZO natively uses GNU GPL licence) or are implemented using .c, .h >> and jni (which again make them dependent on the OS). Please refer these: >> twitter/hadoop-lzo and shevek/lzo-java. >> >> These were the main reasons why we based this on airlift:aircompressor. >> >> >> >> Thanks and Regards, >> >> Amogh >> >> >> >> >> >> >> >> On Tue, Dec 3, 2019 at 2:59 AM Luke Cwik <[email protected]> wrote: >> >>> >> >>> I took a look. My biggest concern is finding a good LZO >> implementation. Looking for one that preferably has: >> >>> 1) Apache license >> >>> 2) Has zero transitive dependencies >> >>> 3) Is small >> >>> 4) Is performant >> >>> 5) Is native java or supports execution on the three main OSs >> (Windows, Linux, Mac) >> >>> >> >>> In your PR you suggested using io.airlift:aircompressor:0.16 which >> doesn't meet item #2 and its transitive dependency fails #3. >> >>> >> >>> On Mon, Dec 2, 2019 at 12:16 PM Amogh Tiwari <[email protected]> >> wrote: >> >>>> >> >>>> Hi, >> >>>> I have filed a PR for an extension that will enable Apache Beam to >> work with LZO/LZOP compression. Please refer. >> >>>> I would love it if someone can take this up and review it. >> >>>> Please feel free to share your thoughts/suggestions. >> >>>> Regards, >> >>>> Amogh >> >
