[
https://issues.apache.org/jira/browse/HDFS-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606888#comment-16606888
]
Adam Antal edited comment on HDFS-13890 at 9/7/18 11:34 AM:
------------------------------------------------------------
High-level overview of the issue broken into subtasks:
In order to include the snapshots one has to
* find all the snapshottable directory
* copy the latest folder structure of the snapshottable directories - (not
sure if we also need all the metadata of all the inodes)
* add inodes to the processor's internal structure with name ".snapshot" to
link the inodes from the snapshots, and also be careful when reconstructing the
internal folder structure (inodes may be used multiple times in different
snapshots with different properties)
* replay the diff elements in SnapshotDiffSection
** using the copy of the latest folder structure we iteratively apply the
snapshotdiffs in reverse: because the snapshotdiffs saves what's the difference
between this and the next snapshot, if for e.g. a deleted INode is written in
the snapshot we have to add it into the directory structure, if we replay in
reverse order.
** also, has to load the INodeReferenceSection, where the possible file
renames could be tracked. This must be done parallelly, if we want to print out
the items while processing
** this has to be done in order, but different snapshottable directories has
no elements in common, hence there's room for multithreading
** we have to be careful about memory consumption, because these snapshots can
have many inodes - we only save the current snapshot, the previous ones should
be written out, and deleted (or the snapshots should be modified with the diffs
in place - so no cloning)
* having the folder structure and inodes of the snapshots, it has to be
written out - but as I mentioned earlier, it can be done while calculating the
snapshots
Source:
[doc|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html]
+ code.
Still unsure at some points how can this be achieved effectively. Will probably
upload patch for the first few items, and discuss steps afterwards.
was (Author: adam.antal):
High-level overview of the issue broken into subtasks:
In order to include the snapshots one has to
* find all the snapshottable directory
* copy the latest folder structure of the snapshottable directories - (not
sure if we also need all the metadata of all the inodes)
* add inodes to the processor's internal structure with name ".snapshot" to
link the inodes from the snapshots, and also be careful when reconstructing the
internal folder structure (inodes may be used multiple times in different
snapshots with different properties)
* replay the diff elements in SnapshotDiffSection
** using the copy of the latest folder structure we iteratively apply the
snapshotdiffs in reverse: because the snapshotdiffs saves what's the difference
between this and the next snapshot, if for e.g. a deleted INode is written in
the snapshot we have to add it into the directory structure, if we replay in
reverse order.
** this has to be done in order, but different snapshottable directories has
no elements in common, hence there's room for multithreading
** we have to be careful about memory consumption, because these snapshots can
have many inodes - we only save the current snapshot, the previous ones should
be written out, and deleted (or the snapshots should be modified with the diffs
in place - so no cloning)
* having the folder structure and inodes of the snapshots, it has to be
written out - but as I mentioned earlier, it can be done while calculating the
snapshots
Source:
[doc|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html]
+ code.
Still unsure at some points how can this be achieved effectively. Will probably
upload patch for the first few items, and discuss steps afterwards.
> Allow Delimited PB OIV tool to print out snapshots
> --------------------------------------------------
>
> Key: HDFS-13890
> URL: https://issues.apache.org/jira/browse/HDFS-13890
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Adam Antal
> Assignee: Adam Antal
> Priority: Minor
>
> HDFS-9721 added the possibility to process PB-based FSImages containing
> snapshots by simply ignoring them.
> Although the XML tool can provide information about the snapshots, the user
> may find helpful if this is shown within the Delimited output (in the
> Delimited format).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]