[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

Stephen O'Donnell (Jira) Wed, 04 Aug 2021 11:31:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393386#comment-17393386
 ]


Stephen O'Donnell commented on HDFS-16147:
------------------------------------------

With your patch in place I think the output file looks like this:

{code}
OVERALL_STREAM
  INODE_SECTION (only in index, not in the data stream)
    COMPRESSED_INODE_SUB_SECTION
    COMPRESSED_INODE_SUB_SECTION
    COMPRESSED_INODE_SUB_SECTION
    ...
    EMPTY_COMPRESSED_INODE_SUB_SECTION
  ...
  DIR_SECTION  (only in index, not in the data stream)
    COMPRESSED_DIR_SUB_SECTION
    COMPRESSED_DIR_SUB_SECTION
    ...
    EMPTY_COMPRESSED_DIR_SUB_SECTION
  ...
{code}

The reason for the empty ones, is because at the end of the INODE section, you 
call commitSectionAndSubSection() and it closes the sub-section and opens a new 
compressed stream. Then you immediately close the section, which closes it and 
opens a new one. I don't think it does any harm, but it would be better if it 
did not do that, if we can fix it without making the code too complex.

Then I think the reason this works, is that if you try to read in parallel, it 
reads each compressed sub-section. This is fine.

When you try to read it serially (ie turn parallel off and load an image or use 
OIV), it will try to read all the compressed INODE sections using a single 
stream. I think this is like a series of streams concatenated together, and the 
decompressor must handle that (concatenated streams) and return the output like 
it is a single compressed stream. We can probably test this out somehow to be 
sure.

In TestFSImage.testNoParallelSectionsWithCompressionEnabled(..) - could you 
remove or rename that test and make it load an image with parallel enabled 
rather than the current test which checks it does not work.

Provided my understanding is correct, this patch looks mostly good apart from 
the two things above, but I will give it a more detailed review in the next day 
or two.

Have you tested this patch on a large image with millions of inodes?

> load fsimage with parallelization and compression
> -------------------------------------------------
>
>                 Key: HDFS-16147
>                 URL: https://issues.apache.org/jira/browse/HDFS-16147
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>    Affects Versions: 3.3.0
>            Reporter: liuyongpan
>            Priority: Minor
>         Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

Reply via email to