Hi
I'm looking for advice on a "to bundle or not to bundle" question for a
PR I'm working on which enables the reading and writing of all of the
compression codecs standardised for Parquet. That amounts to adding
support for LZO, LZ4, Brotli and Zstandard.
Apart from some minor code changes in Drill in itself, users will
obviously also need implementations of each codec and we don't currently
bundle all of the aforementioned. In cases where native codec libs are
involved then I guess platform specifics would become a consideration
but let's gloss over that for now.
In the case of LZO I believe that a GPL license applies and I don't
think it can ever be bundled (but we can still enable it and provide
instructions for users to add it to their installations themselves). In
the case of Brotli there is an Apache-licensed implementation that we
can bundle if we don't mind adding a 750KB JAR file.
So my question is: should I bundle all of the codecs that I can, making
things work out of the box but adding to the size of the distributable?
Or should I put in documentation and error messages that instruct users
to get the codecs themselves instead?
Thanks
James
- Parquet compression codecs and bundling James Turton
-