[ 
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866181#comment-13866181
 ] 

Colin Patrick McCabe commented on HDFS-5722:
--------------------------------------------

[~tlipcon], [~atm], [~hairong], how do you feel about removing support for 
on-disk FSImage compression?

It seems to me that we should just add an option for doing HTTP compression, 
but keep the old option for on-disk compression.  It concerns me that someone 
with a small disk might upgrade to a new version of Hadoop and then be unable 
to save his (much larger) fsimage on a small partition once compression support 
has been removed.  I also think that for really large FSImages, loading a 
compressed version could be faster, if the compression were offloaded to a 
worker thread like Todd suggested in HDFS-1435.

The FSImage is always read sequentially.  If we implement optional sections, 
that won't change this fact.  So I just don't see a reason for messing with 
this.  But maybe there's something I have overlooked.

Thoughts?

> Implement compression in the HTTP server of SNN / SBN instead of FSImage
> ------------------------------------------------------------------------
>
>                 Key: HDFS-5722
>                 URL: https://issues.apache.org/jira/browse/HDFS-5722
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>
> The current FSImage format support compression, there is a field in the 
> header which specifies the compression codec used to compress the data in the 
> image. The main motivation was to reduce the number of bytes to be 
> transferred between SNN / SBN / NN.
> The main disadvantage, however, is that it requires the client to access the 
> FSImage in strictly sequential order. This might not fit well with the new 
> design of FSImage. For example, serializing the data in protobuf allows the 
> client to quickly skip data that it does not understand. The compression 
> built-in the format, however, complicates the calculation of offsets and 
> lengths. Recovering from a corrupted, compressed FSImage is also non-trivial 
> as off-the-shelf tools like bzip2recover is inapplicable.
> This jira proposes to move the compression from the format of the FSImage to 
> the transport layer, namely, the HTTP server of SNN / SBN. This design 
> simplifies the format of FSImage, opens up the opportunity to quickly 
> navigate through the FSImage, and eases the process of recovery. It also 
> retains the benefits of reducing the number of bytes to be transferred across 
> the wire since there are compression on the transport layer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to