[ 
https://issues.apache.org/jira/browse/HDFS-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022596#comment-16022596
 ] 

Rakesh R commented on HDFS-11669:
---------------------------------

Thanks a lot [~surendrasingh] for working on this. Myself and 
[~umamaheswararao] had an offline discussion about this task and noticed few 
important points while combining both set & satisfy storage policy APIs.
# Handle {{throws IOException("as this file/dir was already called for 
satisfying")}} exception case. 
{{#setStoragePolicy(path,policy,scheduleBlockMove)}} will throw exception, if 
there was already a block movement is scheduled and is in progress. This has to 
be documented well in the new API, so that user can handle this case while 
calling this API.
# Maintain the atomicity while updating the editlog. Both the setStoragePolicy 
& satisfySoragePolicy operation should be committed into editlog together and 
avoids any partial edit log state.
# Presently, {{dfs#satisfyStoragePolicy()}} API is applicable only to the given 
path and its immediate children. Due to the complexity of holding the write 
lock for long time, recursive directory scanning and sub-directories won't be 
considered for satisfying the policy. For example, assume we have dir structure 
"/a/b/c/d/e/f/g/h" and user has called the set&satisfy api on root "/a". Now, 
the cost of recursively iterating and updating the satisfySoragePolicy xattr is 
high. But on the other side, setting storage policy is using inheritance 
behavior. Sets the storage policy to the given path only and all its children 
are inheriting from parent, without having any extra locking cost. Now, the 
hybrid api {{scheduleBlockMoves}} need to support recursive behavior inorder to 
be consistent with {{#setStoragePolicy()}} semantics and I feel this brings 
complexity to the system. Earlier at the feature design time, we have decided 
to postpone the recursive sub-dirs behavior implementation to the second phase. 
In that case, how about postponing this task to next phase once we finished 
merging the existing code to trunk code?

One idea to reduce the locking time during recursive iteration is, acquire & 
release lock for each sub-dirs rather than holding lock at the root till all 
the sub-dirs are visited. For example, for directory "/a", acquire the lock and 
set xattr to all its immediate children, then release the lock. Again, pick 
next sub-dir "/a/b", then acquire lock and set xattr to all its immediate 
children, then release the lock. Will continue this fashion till the leaf node 
is visited. Also, to maintain the atomicity we should keep the visited files in 
memory and add this list into edit log entry only after all the sib-directories 
are visited.


> [SPS]: Add option in "setStoragePolicy" command to satisfy the policy.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-11669
>                 URL: https://issues.apache.org/jira/browse/HDFS-11669
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode, shell
>    Affects Versions: HDFS-10285
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>         Attachments: HDFS-11669-HDFS-10285.001.patch
>
>
> Add one new option {{-satisfypolicy}} in {{setStoragePolicy}} command to 
> satisfy the storage policy.
> {noformat}
> hdfs storagepolicies -setStoragePolicy -path <path> -policy <policy> 
> -satisfypolicy
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to