[ 
https://issues.apache.org/jira/browse/HDFS-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606888#comment-16606888
 ] 

Adam Antal edited comment on HDFS-13890 at 9/7/18 11:34 AM:
------------------------------------------------------------

High-level overview of the issue broken into subtasks:

In order to include the snapshots one has to
 * find all the snapshottable directory
 * copy the latest folder structure of the snapshottable directories - (not 
sure if we also need all the metadata of all the inodes)
 * add inodes to the processor's internal structure with name ".snapshot" to 
link the inodes from the snapshots, and also be careful when reconstructing the 
internal folder structure (inodes may be used multiple times in different 
snapshots with different properties)
 * replay the diff elements in SnapshotDiffSection
 ** using the copy of the latest folder structure we iteratively apply the 
snapshotdiffs in reverse: because the snapshotdiffs saves what's the difference 
between this and the next snapshot, if for e.g. a deleted INode is written in 
the snapshot we have to add it into the directory structure, if we replay in 
reverse order.
 ** also, has to load the INodeReferenceSection, where the possible file 
renames could be tracked. This must be done parallelly, if we want to print out 
the items while processing
 ** this has to be done in order, but different snapshottable directories has 
no elements in common, hence there's room for multithreading
 ** we have to be careful about memory consumption, because these snapshots can 
have many inodes - we only save the current snapshot, the previous ones should 
be written out, and deleted (or the snapshots should be modified with the diffs 
in place - so no cloning)
 * having the folder structure and inodes of the snapshots, it has to be 
written out - but as I mentioned earlier, it can be done while calculating the 
snapshots

Source: 
[doc|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html]
 + code.

Still unsure at some points how can this be achieved effectively. Will probably 
upload patch for the first few items, and discuss steps afterwards.


was (Author: adam.antal):
High-level overview of the issue broken into subtasks:

In order to include the snapshots one has to
 * find all the snapshottable directory
 * copy the latest folder structure of the snapshottable directories - (not 
sure if we also need all the metadata of all the inodes)
 * add inodes to the processor's internal structure with name ".snapshot" to 
link the inodes from the snapshots, and also be careful when reconstructing the 
internal folder structure (inodes may be used multiple times in different 
snapshots with different properties)
 * replay the diff elements in SnapshotDiffSection
 ** using the copy of the latest folder structure we iteratively apply the 
snapshotdiffs in reverse: because the snapshotdiffs saves what's the difference 
between this and the next snapshot, if for e.g. a deleted INode is written in 
the snapshot we have to add it into the directory structure, if we replay in 
reverse order.
 ** this has to be done in order, but different snapshottable directories has 
no elements in common, hence there's room for multithreading
 ** we have to be careful about memory consumption, because these snapshots can 
have many inodes - we only save the current snapshot, the previous ones should 
be written out, and deleted (or the snapshots should be modified with the diffs 
in place - so no cloning)
 * having the folder structure and inodes of the snapshots, it has to be 
written out - but as I mentioned earlier, it can be done while calculating the 
snapshots

Source: 
[doc|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html]
 + code.

Still unsure at some points how can this be achieved effectively. Will probably 
upload patch for the first few items, and discuss steps afterwards.

> Allow Delimited PB OIV tool to print out snapshots
> --------------------------------------------------
>
>                 Key: HDFS-13890
>                 URL: https://issues.apache.org/jira/browse/HDFS-13890
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Adam Antal
>            Assignee: Adam Antal
>            Priority: Minor
>
> HDFS-9721 added the possibility to process PB-based FSImages containing 
> snapshots by simply ignoring them. 
> Although the XML tool can provide information about the snapshots, the user 
> may find helpful if this is shown within the Delimited output (in the 
> Delimited format).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to