[
https://issues.apache.org/jira/browse/HADOOP-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830582#comment-15830582
]
Ilya Ganelin commented on HADOOP-12990:
---------------------------------------
[~jzhuge] My proposed approach is to create a new file based loosely on
hadoop/io/compress/Lz4Codec.java reproducing the byte structure analagous to
your 4/3 hack. Does that seem reasonable?
If my goal is ultimately to use this in something like Spark, if the version of
Hadoop we're using is patched with the appropriate class, where would I add
additional logic to switch between the two codecs (Lz4Codec vs. Lz4FrameCodec)?
> lz4 incompatibility between OS and Hadoop
> -----------------------------------------
>
> Key: HADOOP-12990
> URL: https://issues.apache.org/jira/browse/HADOOP-12990
> Project: Hadoop Common
> Issue Type: Bug
> Components: io, native
> Affects Versions: 2.6.0
> Reporter: John Zhuge
> Priority: Minor
>
> {{hdfs dfs -text}} hit exception when trying to view the compression file
> created by Linux lz4 tool.
> The Hadoop version has HADOOP-11184 "update lz4 to r123", thus it is using
> LZ4 library in release r123.
> Linux lz4 version:
> {code}
> $ /tmp/lz4 -h 2>&1 | head -1
> *** LZ4 Compression CLI 64-bits r123, by Yann Collet (Apr 1 2016) ***
> {code}
> Test steps:
> {code}
> $ cat 10rows.txt
> 001|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 002|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 003|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 004|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 005|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 006|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 007|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 008|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 009|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 010|c1|c2|c3|c4|c5|c6|c7|c8|c9
> $ /tmp/lz4 10rows.txt 10rows.txt.r123.lz4
> Compressed 310 bytes into 105 bytes ==> 33.87%
> $ hdfs dfs -put 10rows.txt.r123.lz4 /tmp
> $ hdfs dfs -text /tmp/10rows.txt.r123.lz4
> 16/04/01 08:19:07 INFO compress.CodecPool: Got brand-new decompressor [.lz4]
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
> at
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
> at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
> at
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
> at
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]