[ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746346#comment-15746346 ]
Jason Lowe commented on HADOOP-13578: ------------------------------------- The thing I'm worried about is that when we call ZSTD_compressStream we are passing descriptors for both the input buffer and the output buffer. When we call ZSTD_endStream we are only passing the descriptor for the output buffer. Therefore I don't know how ZSTD_endStream is supposed to finish consuming any input that ZSTD_compressStream didn't get to if it doesn't have access to that input buffer descriptor. Looking at the zstd code you'll see that when it does call ZSTD_compressStream inside ZSTD_endStream, it's calling it with srcSize == 0. That means there is no more source to consume. So if the last call of the JNI code to ZSTD_compressStream did not fully consume the input buffer's data (i.e.: input pos is not moved to the end of the data) then it looks like calling ZSTD_endStream will simply flush out what input data did make it and then end the frame. That matches what the documentation for ZSTD_endStream says. So I still think we need to make sure we do not call ZSTD_endStream if input.pos is not at the end of the input buffer after we call ZSTD_compressStream, or we risk losing the last chunk of data if the zstd library for some reason cannot fully consume the input buffer when we try to finish. > Add Codec for ZStandard Compression > ----------------------------------- > > Key: HADOOP-13578 > URL: https://issues.apache.org/jira/browse/HADOOP-13578 > Project: Hadoop Common > Issue Type: New Feature > Reporter: churro morales > Assignee: churro morales > Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, > HADOOP-13578.v2.patch, HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, > HADOOP-13578.v5.patch, HADOOP-13578.v6.patch > > > ZStandard: https://github.com/facebook/zstd has been used in production for 6 > months by facebook now. v1.0 was recently released. Create a codec for this > library. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org