[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

Hadoop QA (JIRA) Fri, 26 Jul 2019 10:37:41 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894008#comment-16894008
 ]


Hadoop QA commented on HDFS-14099:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  
6s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 48s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
27s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-441/2/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/441 |
| JIRA Issue | HDFS-14099 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 059dd185f2bb 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / c7c7a88 |
| Default Java | 1.8.0_212 |
| checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-441/2/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-441/2/testReport/ |
| Max. process+thread count | 1381 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-441/2/console |
| versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
| Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |


This message was automatically generated.



> Unknown frame descriptor when decompressing multiple frames in 
> ZStandardDecompressor
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-14099
>                 URL: https://issues.apache.org/jira/browse/HDFS-14099
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.3
>         Environment: Hadoop Version: hadoop-3.0.3
> Java Version: 1.8.0_144
>            Reporter: xuzq
>            Priority: Major
>
> We need to use the ZSTD compression algorithm in Hadoop. So I write a simple 
> demo like this for testing.
> {code:java}
> // code placeholder
> while ((size = fsDataInputStream.read(bufferV2)) > 0 ) {
>   countSize += size;
>   if (countSize == 65536 * 8) {
>     if(!isFinished) {
>       // finish a frame in zstd
>       cmpOut.finish();
>       isFinished = true;
>     }
>     fsDataOutputStream.flush();
>     fsDataOutputStream.hflush();
>   }
>   if(isFinished) {
>     LOG.info("Will resetState. N=" + n);
>     // reset the stream and write again
>     cmpOut.resetState();
>     isFinished = false;
>   }
>   cmpOut.write(bufferV2, 0, size);
>   bufferV2 = new byte[5 * 1024 * 1024];
>   n++;
> }
> {code}
>  
> And I use "*hadoop fs -text*"  to read this file and failed. The error as 
> blow.
> {code:java}
> Exception in thread "main" java.lang.InternalError: Unknown frame descriptor
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native
>  Method)
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
> at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {code}
>  
> So I had to look the code, include jni, then found this bug.
> *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*.
> The first is  in *ZStandardDecompressor.c.* 
> {code:java}
> if (size == 0) {
>     (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, 
> JNI_TRUE);
>     size_t result = dlsym_ZSTD_initDStream(stream);
>     if (dlsym_ZSTD_isError(result)) {
>         THROW(env, "java/lang/InternalError", 
> dlsym_ZSTD_getErrorName(result));
>         return (jint) 0;
>     }
> }
> {code}
> This call here is correct, but *Finished* no longer be set to false, even if 
> there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need 
> to be decompressed.
> The second is in *org.apache.hadoop.io.compress.DecompressorStream* by 
> *decompressor.reset()*, because *Finished* is always true after decompressed 
> a *Frame*.
> {code:java}
> if (decompressor.finished()) {
>   // First see if there was any leftover buffered input from previous
>   // stream; if not, attempt to refill buffer.  If refill -> EOF, we're
>   // all done; else reset, fix up input buffer, and get ready for next
>   // concatenated substream/"member".
>   int nRemaining = decompressor.getRemaining();
>   if (nRemaining == 0) {
>     int m = getCompressedData();
>     if (m == -1) {
>       // apparently the previous end-of-stream was also end-of-file:
>       // return success, as if we had never called getCompressedData()
>       eof = true;
>       return -1;
>     }
>     decompressor.reset();
>     decompressor.setInput(buffer, 0, m);
>     lastBytesSent = m;
>   } else {
>     // looks like it's a concatenated stream:  reset low-level zlib (or
>     // other engine) and buffers, then "resend" remaining input data
>     decompressor.reset();
>     int leftoverOffset = lastBytesSent - nRemaining;
>     assert (leftoverOffset >= 0);
>     // this recopies userBuf -> direct buffer if using native libraries:
>     decompressor.setInput(buffer, leftoverOffset, nRemaining);
>     // NOTE:  this is the one place we do NOT want to save the number
>     // of bytes sent (nRemaining here) into lastBytesSent:  since we
>     // are resending what we've already sent before, offset is nonzero
>     // in general (only way it could be zero is if it already equals
>     // nRemaining), which would then screw up the offset calculation
>     // _next_ time around.  IOW, getRemaining() is in terms of the
>     // original, zero-offset bufferload, so lastBytesSent must be as
>     // well.  Cheesy ASCII art:
>     //
>     //          <------------ m, lastBytesSent ----------->
>     //          +===============================================+
>     // buffer:  |1111111111|22222222222222222|333333333333|     |
>     //          +===============================================+
>     //     #1:  <-- off -->|<-------- nRemaining --------->
>     //     #2:  <----------- off ----------->|<-- nRem. -->
>     //     #3:  (final substream:  nRemaining == 0; eof = true)
>     //
>     // If lastBytesSent is anything other than m, as shown, then "off"
>     // will be calculated incorrectly.
>   }
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

Reply via email to