Yes, it's a simple matter to install the dependency... it just might not be installed by default. I'd certainly recommend downstream vendors/packagers add it as a required or suggested dependency to their RPMs/DEBs/etc., though.
The snappy package on RHEL/CentOS provides libsnappy. The org.xerial.snappy:snappy-java dependency provides JNI support, but it looks like Hadoop doesn't use that and instead uses its own JNI stuffs. Neither seem to provide a non-native implementation, as far as I can tell. So, I guess I was wrong about that. You definitely need the native library installed for it to work at all. On Sat, Aug 13, 2016 at 11:42 PM Josh Elser <[email protected]> wrote: > That's a fair point. I'm off in nebulous vendor land and tend to be removed > from pure Apache Hadoop artifacts. I feel like there's a snappy package (at > least on centos) which is enough, but understanding this would be good. > > Is there a nonnative snappy impl? > > On Aug 13, 2016 11:19 PM, "Christopher" <[email protected]> wrote: > > > Native libraries for snappy are also not typically installed by default > on > > Linux distros. Even if the hadoop native libraries are installed, the > user > > is likely going to end up using the Java implementation by default, I > > *think*, unless they take additional actions. > > > > On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs <[email protected]> wrote: > > > > > In my experience gz gets roughly 1.5x to 2x better compression than > > snappy. > > > Snappy is definitely not a pareto improvement (although we tend to use > > > snappy by default). Since it's not always better I think you would > need a > > > more solid argument to change the default. > > > > > > Adam > > > > > > On Aug 13, 2016 8:06 PM, "Josh Elser" <[email protected]> wrote: > > > > > > > Same motivation of using it as for making it the default. I am not > > aware > > > > of any downside to it. It's become pretty standard across all > > > installations > > > > I've worked with for years. > > > > > > > > Asking because I am no oracle on the matter. I could just be ignorant > > of > > > > some issue, but, given my current understanding, there is no downside > > for > > > > the average case. > > > > > > > > Christopher wrote: > > > > > > > >> Sorry. I wasn't clear. I understand the motivation for using it... > I'm > > > >> asking about the motivation for making it the default. > > > >> > > > >> Since both are available, I'm not sure the default matters *that* > > much, > > > >> but > > > >> it could be an unexpected change for those preferring GZ. > > > >> > > > >> Also, are there any risks regarding library availability of snappy? > GZ > > > is > > > >> pretty ubiquitous. > > > >> > > > >> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<[email protected]> > > > wrote: > > > >> > > > >> Uhh, besides what I already mentioned? (close in compressed size but > > > >>> "much" faster) > > > >>> > > > >>> Christopher wrote: > > > >>> > > > >>>> What's the motivation for changing it? > > > >>>> > > > >>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<[email protected]> > > > >>>> > > > >>> wrote: > > > >>> > > > >>>> Any reason we don't want to do this? Last rule-of-thumb I heard > was > > > that > > > >>>>> snappy is often close enough in compression to GZ but quite a bit > > > >>>>> faster > > > >>>>> (I don't remember exactly how much). > > > >>>>> > > > >>>>> - Josh > > > >>>>> > > > >>>>> > > > >> > > > > > >
