[jira] [Commented] (HADOOP-19255) LZO files cannot be decompressed

ASF GitHub Bot (Jira) Wed, 21 Aug 2024 00:14:15 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875374#comment-17875374
 ]


ASF GitHub Bot commented on HADOOP-19255:
-----------------------------------------

guptashailesh92 opened a new pull request, #7009:
URL: https://github.com/apache/hadoop/pull/7009

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   Fix LZO decompression due to 
[change](https://github.com/apache/hadoop/pull/5912/files#diff-268b9968a4db21ac6eeb7bcaef10e4db744d00ba53989fc7251bb3e8d9eac7dfR904)
 in hadoop 3..4.0
   
   ### How was this patch tested?
   `hadoop fs -text file:///home/hadoop/part-ak.lzo`
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> LZO files cannot be decompressed
> --------------------------------
>
>                 Key: HADOOP-19255
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19255
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.4.0
>            Reporter: Shailesh Gupta
>            Priority: Critical
>
> The following command fails with the below exception:
> hadoop fs -text [file:///home/hadoop/part-ak.lzo]
> {code:java}
> 2024-08-21 05:05:07,418 INFO lzo.GPLNativeCodeLoader: Loaded native gpl 
> library
> 2024-08-21 05:05:08,706 INFO lzo.LzoCodec: Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 049362b7cf53ff5f739d6b1532457f2c6cd495e8]
> 2024-08-21 05:07:01,542 INFO compress.CodecPool: Got brand-new decompressor 
> [.lzo]
> 2024-08-21 05:07:14,558 WARN lzo.LzopInputStream: Incorrect LZO file format: 
> file did not end with four trailing zeroes.
> java.io.IOException: Corrupted uncompressed block
>     at 
> com.hadoop.compression.lzo.LzopInputStream.verifyChecksums(LzopInputStream.java:219)
>     at 
> com.hadoop.compression.lzo.LzopInputStream.close(LzopInputStream.java:342)
>     at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:102)
>     at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:95)
>     at 
> org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:383)
>     at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:346)
>     at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:319)
>     at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:301)
>     at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:285)
>     at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:121)
>     at org.apache.hadoop.fs.shell.Command.run(Command.java:192)
>     at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
>     at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
> Exception in thread "main" java.lang.InternalError: lzo1x_decompress_safe 
> returned: -5
>     at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
>     at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:315)
>     at 
> com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122)
>     at 
> com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:252)
>     at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:110)
>     at java.base/java.io.InputStream.read(InputStream.java:218)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:95)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:132)
>     at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:100)
>     at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:95)
>     at 
> org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:383)
>     at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:346)
>     at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:319)
>     at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:301)
>     at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:285)
>     at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:121)
>     at org.apache.hadoop.fs.shell.Command.run(Command.java:192)
>     at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
>     at org.apache.hadoop.fs.FsShell.main(FsShell.java:390) {code}
> From my analysis, i was pinpoint to the 
> [change|https://github.com/apache/hadoop/pull/5912/files#diff-268b9968a4db21ac6eeb7bcaef10e4db744d00ba53989fc7251bb3e8d9eac7dfR904]
>  which changed _io.compression.codec.lzo.buffersize_ from 64KB to 256KB.
> Earlier, the default value was being picked from 
> [here|https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/compression/lzo/LzoCodec.java#L51].
> Let me know if my analysis looks good. What should be the proper approach to 
> fixing it?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19255) LZO files cannot be decompressed

Reply via email to