[
https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052955#comment-13052955
]
Todd Lipcon commented on HADOOP-7206:
-------------------------------------
Sorry, I stopped paying attention to this for a while... I have some concerns
about the way this ended up:
We're now pulling in a jar which autoexpands its .so dependency into /tmp and
then loads native libraries that way. That's (a) messy, (b) potentially
insecure without workarounds to change /tmp to some other dir, and (c)
inconsistent with how native libraries work. These are the same arguments
Alejandro made above
This maven artifact that we now depend on is something that isn't easy to
rebuild, and it's not even clear how it gets build. For example, which glibc
version is it linked against? Which OSX version is the included dylib built on?
Seems a little scary as a dependency
It seems the motivation to switch from the hadoop-snappy approach to the
java-snappy approach was that the former approach depended on having snappy.so
available at runtime, which isn't always the case. I would propose the
following:
- at build time, you can choose (a) disable snappy, (b) enable snappy and
dynamically link our JNI shims against snappy.so, or (c) enable snappy and
statically link against snappy.so
- those who don't care about snappy choose (a)
- those who care about snappy and plan to deploy on systems where libsnappy.so
is deployed system-wide (eg fedora or most recent ubuntu) can choose (b) to
pick up the snappy lib off the system
- those who care about snappy and plan to deploy elsewhere choose (c), and just
make sure that snappy is available at compile time
Then the hadoopsnappy.so can be included in lib/native just like our other
native dependencies without the unjar-to-tmp hackery.
Does this idea address everyone's goals?
> Integrate Snappy compression
> ----------------------------
>
> Key: HADOOP-7206
> URL: https://issues.apache.org/jira/browse/HADOOP-7206
> Project: Hadoop Common
> Issue Type: New Feature
> Affects Versions: 0.21.0
> Reporter: Eli Collins
> Assignee: T Jake Luciani
> Fix For: 0.23.0
>
> Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch,
> v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> v3-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy
> (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum
> compression, or compatibility with any other compression library; instead, it
> aims for very high speeds and reasonable compression. For instance, compared
> to the fastest mode of zlib, Snappy is an order of magnitude faster for most
> inputs, but the resulting compressed files are anywhere from 20% to 100%
> bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy
> compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec
> or more.
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira