[jira] [Commented] (HADOOP-7206) Integrate Snappy compression

Todd Lipcon (JIRA) Tue, 21 Jun 2011 17:05:12 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052955#comment-13052955
 ]


Todd Lipcon commented on HADOOP-7206:
-------------------------------------

Sorry, I stopped paying attention to this for a while... I have some concerns 
about the way this ended up:

We're now pulling in a jar which autoexpands its .so dependency into /tmp and 
then loads native libraries that way. That's (a) messy, (b) potentially 
insecure without workarounds to change /tmp to some other dir, and (c) 
inconsistent with how native libraries work. These are the same arguments 
Alejandro made above

This maven artifact that we now depend on is something that isn't easy to 
rebuild, and it's not even clear how it gets build. For example, which glibc 
version is it linked against? Which OSX version is the included dylib built on? 
Seems a little scary as a dependency

It seems the motivation to switch from the hadoop-snappy approach to the 
java-snappy approach was that the former approach depended on having snappy.so 
available at runtime, which isn't always the case. I would propose the 
following:
- at build time, you can choose (a) disable snappy, (b) enable snappy and 
dynamically link our JNI shims against snappy.so, or (c) enable snappy and 
statically link against snappy.so
- those who don't care about snappy choose (a)
- those who care about snappy and plan to deploy on systems where libsnappy.so 
is deployed system-wide (eg fedora or most recent ubuntu) can choose (b) to 
pick up the snappy lib off the system
- those who care about snappy and plan to deploy elsewhere choose (c), and just 
make sure that snappy is available at compile time

Then the hadoopsnappy.so can be included in lib/native just like our other 
native dependencies without the unjar-to-tmp hackery.

Does this idea address everyone's goals?

> Integrate Snappy compression
> ----------------------------
>
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: T Jake Luciani
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, 
> v2-HADOOP-7206-snappy-codec-using-snappy-java.txt, 
> v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, 
> v4-HADOOP-7206-snappy-codec-using-snappy-java.txt, 
> v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy 
> (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum 
> compression, or compatibility with any other compression library; instead, it 
> aims for very high speeds and reasonable compression. For instance, compared 
> to the fastest mode of zlib, Snappy is an order of magnitude faster for most 
> inputs, but the resulting compressed files are anywhere from 20% to 100% 
> bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy 
> compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec 
> or more.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7206) Integrate Snappy compression

Reply via email to