[
https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552135#comment-15552135
]
Jason Lowe commented on HADOOP-13578:
-------------------------------------
I don't believe the block stuff has anything to do with HDFS blocks. Rather it
describes compression occurring in chunks (blocks) of data at a time. Without
the small header at the beginning of each block, it becomes difficult in a
general way to know how much data is in the next compressed block when
decompressing it. Using the Block codec streams doesn't inherently make the
data splittable since one can't easily locate the codec block boundaries at an
arbitrary split in the data stream (i.e.: HDFS block boundaries). IMHO if we
want to chunk the data for splitting then we can just use a SequenceFile
configured for block compression with this codec.
Using the Block streams is a big drawback since it makes the format
incompatible with the compression standard. This already causes problems with
LZ4, see HADOOP-12990. Rather that compressing in blocks that we have to put
extra headers on to decode we can use the zstd streaming APIs to stream the
data through the compressor and decompressor. That lets us keep the file
format compatible and avoids error scenarios where the codec is configured to
use a buffer size that is too small to decompress one of the codec blocks.
With the streaming API we are decoupling our buffer size from the size of the
data to compress/decompress.
> Add Codec for ZStandard Compression
> -----------------------------------
>
> Key: HADOOP-13578
> URL: https://issues.apache.org/jira/browse/HADOOP-13578
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: churro morales
> Assignee: churro morales
> Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6
> months by facebook now. v1.0 was recently released. Create a codec for this
> library.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]