[
https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053590#comment-13053590
]
Scott Carey commented on HADOOP-7206:
-------------------------------------
bq. However, it has a serious drawback; the native code is not built in target
OS, only on the same architecture. Because of this the build is not easy
reproducible as there is not knowledge of the OS used to build it.
Sure it is reproducible. snappy is used as an artifact, not built from source.
The build is reproducible because it _always_ uses the same artifact, and
always produces the same output. Is it a requirement to recompile all Java
jars to be reproducible?
hadoop-snappy has another drawback/benefit pair:
Users may have snappy-java in their paths for their own use (for example via
Avro, Hive, Hbase, or user code).
Drawback: the library can't be shared, bloating the # of classes and jars
Benefit: the library won't have a version conflict
Unknown(to me): does a snappy-java binding conflict with a hadoop custom one if
both are loaded in the same JVM / Classloader?
I think the check for a system available libsnappy.so prior to loading the one
in the jar should go into the snappy-java project, then users can optionally
compile one and make it available to Hadoop, or use the one in the jar, and
Hadoop has to maintain less code and build infrastructure as a result.
> Integrate Snappy compression
> ----------------------------
>
> Key: HADOOP-7206
> URL: https://issues.apache.org/jira/browse/HADOOP-7206
> Project: Hadoop Common
> Issue Type: New Feature
> Affects Versions: 0.21.0
> Reporter: Eli Collins
> Assignee: Alejandro Abdelnur
> Fix For: 0.23.0
>
> Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch,
> v2-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> v3-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> v4-HADOOP-7206-snappy-codec-using-snappy-java.txt,
> v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy
> (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum
> compression, or compatibility with any other compression library; instead, it
> aims for very high speeds and reasonable compression. For instance, compared
> to the fastest mode of zlib, Snappy is an order of magnitude faster for most
> inputs, but the resulting compressed files are anywhere from 20% to 100%
> bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy
> compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec
> or more.
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira