[ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866181#comment-13866181 ]
Colin Patrick McCabe commented on HDFS-5722: -------------------------------------------- [~tlipcon], [~atm], [~hairong], how do you feel about removing support for on-disk FSImage compression? It seems to me that we should just add an option for doing HTTP compression, but keep the old option for on-disk compression. It concerns me that someone with a small disk might upgrade to a new version of Hadoop and then be unable to save his (much larger) fsimage on a small partition once compression support has been removed. I also think that for really large FSImages, loading a compressed version could be faster, if the compression were offloaded to a worker thread like Todd suggested in HDFS-1435. The FSImage is always read sequentially. If we implement optional sections, that won't change this fact. So I just don't see a reason for messing with this. But maybe there's something I have overlooked. Thoughts? > Implement compression in the HTTP server of SNN / SBN instead of FSImage > ------------------------------------------------------------------------ > > Key: HDFS-5722 > URL: https://issues.apache.org/jira/browse/HDFS-5722 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Haohui Mai > > The current FSImage format support compression, there is a field in the > header which specifies the compression codec used to compress the data in the > image. The main motivation was to reduce the number of bytes to be > transferred between SNN / SBN / NN. > The main disadvantage, however, is that it requires the client to access the > FSImage in strictly sequential order. This might not fit well with the new > design of FSImage. For example, serializing the data in protobuf allows the > client to quickly skip data that it does not understand. The compression > built-in the format, however, complicates the calculation of offsets and > lengths. Recovering from a corrupted, compressed FSImage is also non-trivial > as off-the-shelf tools like bzip2recover is inapplicable. > This jira proposes to move the compression from the format of the FSImage to > the transport layer, namely, the HTTP server of SNN / SBN. This design > simplifies the format of FSImage, opens up the opportunity to quickly > navigate through the FSImage, and eases the process of recovery. It also > retains the benefits of reducing the number of bytes to be transferred across > the wire since there are compression on the transport layer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)