[
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393386#comment-17393386
]
Stephen O'Donnell commented on HDFS-16147:
------------------------------------------
With your patch in place I think the output file looks like this:
{code}
OVERALL_STREAM
INODE_SECTION (only in index, not in the data stream)
COMPRESSED_INODE_SUB_SECTION
COMPRESSED_INODE_SUB_SECTION
COMPRESSED_INODE_SUB_SECTION
...
EMPTY_COMPRESSED_INODE_SUB_SECTION
...
DIR_SECTION (only in index, not in the data stream)
COMPRESSED_DIR_SUB_SECTION
COMPRESSED_DIR_SUB_SECTION
...
EMPTY_COMPRESSED_DIR_SUB_SECTION
...
{code}
The reason for the empty ones, is because at the end of the INODE section, you
call commitSectionAndSubSection() and it closes the sub-section and opens a new
compressed stream. Then you immediately close the section, which closes it and
opens a new one. I don't think it does any harm, but it would be better if it
did not do that, if we can fix it without making the code too complex.
Then I think the reason this works, is that if you try to read in parallel, it
reads each compressed sub-section. This is fine.
When you try to read it serially (ie turn parallel off and load an image or use
OIV), it will try to read all the compressed INODE sections using a single
stream. I think this is like a series of streams concatenated together, and the
decompressor must handle that (concatenated streams) and return the output like
it is a single compressed stream. We can probably test this out somehow to be
sure.
In TestFSImage.testNoParallelSectionsWithCompressionEnabled(..) - could you
remove or rename that test and make it load an image with parallel enabled
rather than the current test which checks it does not work.
Provided my understanding is correct, this patch looks mostly good apart from
the two things above, but I will give it a more detailed review in the next day
or two.
Have you tested this patch on a large image with millions of inodes?
> load fsimage with parallelization and compression
> -------------------------------------------------
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namanode
> Affects Versions: 3.3.0
> Reporter: liuyongpan
> Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch,
> subsection.svg
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]