[ 
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-11402:
--------------------------------------
    Attachment: HDFS-11402.07.patch

Thanks for the review comments [~andrew.wang]. Attached v07 patch to address 
the following comments. Please take a look.

bq. Would also be interested to know how you chose 512 inodes/thread as it 
seems like a magic number, was this based on benchmarking?
This approach addresses concerns on unnecessary thread pool for small number of 
open files. Yes, I did testing with 1K,5K,10K,15K,50K open files and chose this 
number when thread pool started to make some improvement.

bq. the byte[][] is already available getINodesAndPaths() along with INode[]. 
Wouldn't it be a lot slower to have one more helper method to construct the 
same byte[][] ?
I was concerned about having to do another while loop when the former approach 
coalesced both into one. Anyway, the added complexity isn't much. Changed the 
code as suggested.

bq. isDescendant, if we're an IIP, can we simply look in our array of INodes 
for the specified ancestor? This method looks expensive right now.
Done.

bq. hasReadLock checks hasWriteLock, so the additional hasWriteLock check is 
unnecessary:
Done.

bq. We could use a stride increment to simplify the work partitioning logic 
(and make work distribution more even):
The suggested approach doesn't collect partition inodes at once for the worker 
and needs multiple iteration. If something can be done better here, will take 
it up in another jira.

bq. We aren't using workerIdx when getting the future results, so can simplify 
with foreach:
Done.

bq. Mind filing a JIRA to migrate LeaseManager over to SLF4J as well? We're 
adding a pretty gnarly new log that would be more readable with SLF4j.
Sure, will update after filing the jira.

> HDFS Snapshots should capture point-in-time copies of OPEN files
> ----------------------------------------------------------------
>
>                 Key: HDFS-11402
>                 URL: https://issues.apache.org/jira/browse/HDFS-11402
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 2.6.0
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch, 
> HDFS-11402.03.patch, HDFS-11402.04.patch, HDFS-11402.05.patch, 
> HDFS-11402.06.patch, HDFS-11402.07.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. That is, 
> these open files are not frozen in HDFS Snapshots. These open files 
> grow/shrink in length, just like the original file, even after the snapshot 
> time.
> 2. At the time of File close or any other meta data modification operation on 
> these files, HDFS reconciles the file length and records the modification in 
> the last taken Snapshot. All the previously taken Snapshots continue to have 
> those open Files with no modification recorded. So, all those previous 
> snapshots end up using the final modification record in the last snapshot. 
> Thus after the file close, file lengths in all those snapshots will end up 
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can 
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze 
> open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with 
> {{LeaseManager}} and get a list INodesInPath for all open files under the 
> snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, 
> Diff creation and updating modification time, can invoke 
> {{INodeFile#recordModification}} for each of the open files. This way, the 
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for 
> each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't 
> introduce any any disk formats. So, I don't think we will be needing any new 
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of 
> snapshots in O(1) time. LeaseManager though has all the open files with 
> leases in-memory map, an iteration is still needed to prune the needed open 
> files and then run recordModification on each of them. So, it will not be a 
> strict O(1) with the above proposal. But, its going be a marginal increase 
> only as the new order will be of O(open_files_under_snap_dir). In order to 
> avoid HDFS Snapshots change in behavior for open files and avoid change in 
> time complexity, this improvement can be made under a new config 
> {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be 
> {{false}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to