[ 
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552135#comment-15552135
 ] 

Jason Lowe commented on HADOOP-13578:
-------------------------------------

I don't believe the block stuff has anything to do with HDFS blocks.  Rather it 
describes compression occurring in chunks (blocks) of data at a time.  Without 
the small header at the beginning of each block, it becomes difficult in a 
general way to know how much data is in the next compressed block when 
decompressing it.  Using the Block codec streams doesn't inherently make the 
data splittable since one can't easily locate the codec block boundaries at an 
arbitrary split in the data stream (i.e.: HDFS block boundaries).  IMHO if we 
want to chunk the data for splitting then we can just use a SequenceFile 
configured for block compression with this codec.

Using the Block streams is a big drawback since it makes the format 
incompatible with the compression standard.  This already causes problems with 
LZ4, see HADOOP-12990.  Rather that compressing in blocks that we have to put 
extra headers on to decode we can use the zstd streaming APIs to stream the 
data through the compressor and decompressor.  That lets us keep the file 
format compatible and avoids error scenarios where the codec is configured to 
use a buffer size that is too small to decompress one of the codec blocks.  
With the streaming API we are decoupling our buffer size from the size of the 
data to compress/decompress.

> Add Codec for ZStandard Compression
> -----------------------------------
>
>                 Key: HADOOP-13578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13578
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 
> months by facebook now.  v1.0 was recently released.  Create a codec for this 
> library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to