Ok, understood. Such a change would certainly require mention in release notes, user manual, etc.

Christopher wrote:
Yes, it's a simple matter to install the dependency... it just might not be
installed by default. I'd certainly recommend downstream vendors/packagers
add it as a required or suggested dependency to their RPMs/DEBs/etc.,
though.

The snappy package on RHEL/CentOS provides libsnappy. The
org.xerial.snappy:snappy-java dependency provides JNI support, but it looks
like Hadoop doesn't use that and instead uses its own JNI stuffs. Neither
seem to provide a non-native implementation, as far as I can tell. So, I
guess I was wrong about that. You definitely need the native library
installed for it to work at all.

On Sat, Aug 13, 2016 at 11:42 PM Josh Elser<[email protected]>  wrote:

That's a fair point. I'm off in nebulous vendor land and tend to be removed
from pure Apache Hadoop artifacts. I feel like there's a snappy package (at
least on centos) which is enough, but understanding this would be good.

Is there a nonnative snappy impl?

On Aug 13, 2016 11:19 PM, "Christopher"<[email protected]>  wrote:

Native libraries for snappy are also not typically installed by default
on
Linux distros. Even if the hadoop native libraries are installed, the
user
is likely going to end up using the Java implementation by default, I
*think*, unless they take additional actions.

On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs<[email protected]>  wrote:

In my experience gz gets roughly 1.5x to 2x better compression than
snappy.
Snappy is definitely not a pareto improvement (although we tend to use
snappy by default). Since it's not always better I think you would
need a
more solid argument to change the default.

Adam

On Aug 13, 2016 8:06 PM, "Josh Elser"<[email protected]>  wrote:

Same motivation of using it as for making it the default. I am not
aware
of any downside to it. It's become pretty standard across all
installations
I've worked with for years.

Asking because I am no oracle on the matter. I could just be ignorant
of
some issue, but, given my current understanding, there is no downside
for
the average case.

Christopher wrote:

Sorry. I wasn't clear. I understand the motivation for using it...
I'm
asking about the motivation for making it the default.

Since both are available, I'm not sure the default matters *that*
much,
but
it could be an unexpected change for those preferring GZ.

Also, are there any risks regarding library availability of snappy?
GZ
is
pretty ubiquitous.

On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<[email protected]>
wrote:
Uhh, besides what I already mentioned? (close in compressed size but
"much" faster)

Christopher wrote:

What's the motivation for changing it?

On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<[email protected]>

wrote:

Any reason we don't want to do this? Last rule-of-thumb I heard
was
that
snappy is often close enough in compression to GZ but quite a bit
faster
(I don't remember exactly how much).

- Josh



Reply via email to