[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

Stephen O'Donnell (Jira) Wed, 04 Aug 2021 10:28:08 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393356#comment-17393356
 ]


Stephen O'Donnell commented on HDFS-16147:
------------------------------------------

On further review, most of what I wrote above is wrong!

When saving the image, there is a single output stream, but each section is 
compressed within that stream, each as a separate compressed stream, eg:

{code}
OVERALL_STREAM
    COMPRESSED_INODE_SECTION
    COMPRESSED_DIR_SECTION
    ...
{code}

You can see this in the commitSection() method, where the stream is finished().

So this means that when we load a section (not in parallel), it jumps to the 
start of a compressed section, and reads it in full.

This means it is still unknown how you can save a compressed image with 
sub-sections and load it without parallel. Perhaps a compressed stream can read 
embedded compressed streams within itself - I am not sure, but I would like to 
understand how this is working.

> load fsimage with parallelization and compression
> -------------------------------------------------
>
>                 Key: HDFS-16147
>                 URL: https://issues.apache.org/jira/browse/HDFS-16147
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>    Affects Versions: 3.3.0
>            Reporter: liuyongpan
>            Priority: Minor
>         Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

Reply via email to