[
https://issues.apache.org/jira/browse/HADOOP-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032112#comment-17032112
]
Redriver edited comment on HADOOP-12990 at 2/7/20 5:21 AM:
-----------------------------------------------------------
I think Spark writes event log with hadoop lz4 format. But I cannot decompress
those *.lz4 files. Do you know how to decompress them?
The background is I want to analyze Spark event logs offline, so I download
them from Spark history server. They are *.lz4 files.
Linux lz4 tool cannot decompress it. I tried hdfs, and got the following errors
and found this issue by googling the error message.
$ hdfs dfs -text [file:///home/xxx/application.lz4]
2020-02-07 13:08:35,664 INFO compress.CodecPool: Got brand-new decompressor
[.lz4]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
......
$ head -n 1 application.lz4 | od -t x1
0000000 4c 5a 34 42 6c 6f 63 6b 25 2e 21 00 00 00 80 00
0000020 00 b0 b4 7c 01 f2 12 7b 22 45 76 65 6e 74 22 3a
0000040 22 53 70 61 72 6b 4c 69 73 74 65 6e 65 72 4c 6f
0000060 67 53 74 61 72 74 22 2c 18 00 ff 04 20 56 65 72
0000100 73 69 6f 6e 22 3a 22 32 2e 33 2e 30 22 7d 0a
0000117
$ head -n 1 application.lz4
LZ4Block%.!???|?\{"Event":"SparkListenerLogStart",? Version":"2.3.0"}
was (Author: redriver):
I think Spark writes event log with hadoop lz4 format. But I cannot decompress
those *.lz4 files.
The background is I want to analyze Spark event logs offline, so I download
them from Spark history server. They are *.lz4 files.
Linux lz4 tool cannot decompress it. I tried hdfs, and got the following errors
and found this issue by googling the error message.
$ hdfs dfs -text file:///home/xxx/application.lz4
2020-02-07 13:08:35,664 INFO compress.CodecPool: Got brand-new decompressor
[.lz4]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
......
$ head -n 1 application.lz4 | od -t x1
0000000 4c 5a 34 42 6c 6f 63 6b 25 2e 21 00 00 00 80 00
0000020 00 b0 b4 7c 01 f2 12 7b 22 45 76 65 6e 74 22 3a
0000040 22 53 70 61 72 6b 4c 69 73 74 65 6e 65 72 4c 6f
0000060 67 53 74 61 72 74 22 2c 18 00 ff 04 20 56 65 72
0000100 73 69 6f 6e 22 3a 22 32 2e 33 2e 30 22 7d 0a
0000117
$ head -n 1 application.lz4
LZ4Block%.!???|?\{"Event":"SparkListenerLogStart",? Version":"2.3.0"}
> lz4 incompatibility between OS and Hadoop
> -----------------------------------------
>
> Key: HADOOP-12990
> URL: https://issues.apache.org/jira/browse/HADOOP-12990
> Project: Hadoop Common
> Issue Type: Bug
> Components: io, native
> Affects Versions: 2.6.0
> Reporter: John Zhuge
> Priority: Minor
>
> {{hdfs dfs -text}} hit exception when trying to view the compression file
> created by Linux lz4 tool.
> The Hadoop version has HADOOP-11184 "update lz4 to r123", thus it is using
> LZ4 library in release r123.
> Linux lz4 version:
> {code}
> $ /tmp/lz4 -h 2>&1 | head -1
> *** LZ4 Compression CLI 64-bits r123, by Yann Collet (Apr 1 2016) ***
> {code}
> Test steps:
> {code}
> $ cat 10rows.txt
> 001|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 002|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 003|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 004|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 005|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 006|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 007|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 008|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 009|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 010|c1|c2|c3|c4|c5|c6|c7|c8|c9
> $ /tmp/lz4 10rows.txt 10rows.txt.r123.lz4
> Compressed 310 bytes into 105 bytes ==> 33.87%
> $ hdfs dfs -put 10rows.txt.r123.lz4 /tmp
> $ hdfs dfs -text /tmp/10rows.txt.r123.lz4
> 16/04/01 08:19:07 INFO compress.CodecPool: Got brand-new decompressor [.lz4]
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
> at
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
> at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
> at
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
> at
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]