[
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866181#comment-13866181
]
Colin Patrick McCabe commented on HDFS-5722:
--------------------------------------------
[~tlipcon], [~atm], [~hairong], how do you feel about removing support for
on-disk FSImage compression?
It seems to me that we should just add an option for doing HTTP compression,
but keep the old option for on-disk compression. It concerns me that someone
with a small disk might upgrade to a new version of Hadoop and then be unable
to save his (much larger) fsimage on a small partition once compression support
has been removed. I also think that for really large FSImages, loading a
compressed version could be faster, if the compression were offloaded to a
worker thread like Todd suggested in HDFS-1435.
The FSImage is always read sequentially. If we implement optional sections,
that won't change this fact. So I just don't see a reason for messing with
this. But maybe there's something I have overlooked.
Thoughts?
> Implement compression in the HTTP server of SNN / SBN instead of FSImage
> ------------------------------------------------------------------------
>
> Key: HDFS-5722
> URL: https://issues.apache.org/jira/browse/HDFS-5722
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Haohui Mai
>
> The current FSImage format support compression, there is a field in the
> header which specifies the compression codec used to compress the data in the
> image. The main motivation was to reduce the number of bytes to be
> transferred between SNN / SBN / NN.
> The main disadvantage, however, is that it requires the client to access the
> FSImage in strictly sequential order. This might not fit well with the new
> design of FSImage. For example, serializing the data in protobuf allows the
> client to quickly skip data that it does not understand. The compression
> built-in the format, however, complicates the calculation of offsets and
> lengths. Recovering from a corrupted, compressed FSImage is also non-trivial
> as off-the-shelf tools like bzip2recover is inapplicable.
> This jira proposes to move the compression from the format of the FSImage to
> the transport layer, namely, the HTTP server of SNN / SBN. This design
> simplifies the format of FSImage, opens up the opportunity to quickly
> navigate through the FSImage, and eases the process of recovery. It also
> retains the benefits of reducing the number of bytes to be transferred across
> the wire since there are compression on the transport layer.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)