[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208461#comment-16208461
 ] 

Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------

Hi [~andrew.wang], sorry for the delayed response. In between we were swapped 
into other works.

{quote}Maybe I misunderstood this API then, since it wasn't mentioned in the 
"Administrator notes" where it talks about the interaction with the Mover. 
Should this API instead be "isSpsEnabled"? The docs indicate right now that 
when the SPS is "activated" (enabled via configuration?), the Mover cannot be 
run, and also vice versa.
The docs also say If a Mover instance is already triggered and running, SPS 
will be deactivated while starting., does "starting" here mean enabling 
dynamically via configuration, or triggering an SPS operation?
{quote}
Yes, The current API is just for indicating SPS is running or not. It will not 
show any additional information. This was mainly added for Mover tool to know 
whether in-built SPS is already running. “activated” means SPS thread is up 
running. “starting” here means when dynamically enable or NN switch time.

{quote}
setrep -w waits for the setrep to complete, it's pretty common to call it like 
this.
{quote}
After our discussion, we do plan add this status reporting support. Work is in 
progress. Please review HDFS-12310, if possible.


{quote}
For SPS work:
        •       The NN selects a C-DN and sends it a batch of work on the 
heartbeat
        •       The C-DN calls replaceBlock on the blocks
        •       The src and target DNs do the replaceBlock and inform the NN on 
their next heartbeat (IBR).
        •       The C-DN informs the NN that the batch is complete on its next 
heartbeat.
It's this last step that can add latency. Completion requires the IBRs of the 
src/target DNs, but also the status from the C-DN. This can add up to a 
heartbeat interval. It wouldn't be necessary if the NN tracked completion 
instead.

….

I read the code to better understand this flow. The C-DN calls replaceBlock on 
the src and target DNs of the work batches.
I'm still unconvinced that we save much by moving block-level completion 
tracking to the DN. PendingReconstructionBlocks + LowRedundancyBlocks works 
pretty well with block-level tracking, and that's even when a ton of work gets 
queued up due to a failure. For SPS, we can do better since we can throttle the 
directory scan speed and thus limit the number of outstanding work items. This 
would make any file-level vs. block-level overheads marginal.
{quote}
In any case, IBR is necessary for NN to to know block has moved and transfer 
block flow is notifying to NN when moved. 

However, we refactored the code to track at blocks level from Namenode, thats 
pretty straightforward change. C-DN was batching the blocks, now we don’t batch 
and track each block separately. HDFS-12570
I hope , this addresses all your concerns related to design.

{quote}
Could you also comment on how SPS work is prioritized against block work from 
LowRedundancyBlocks? SPS actions are lower priority than maintaining durability.
{quote}
Right now, they both are different thread. Probably in-future(2nd Phase), SPS 
thread can actively monitor LowRedundency queues and act. 
but now SPS, throttle itself to make sure not to have more than 1000 elements 
in memory. Also SPS is giving high priority to LowRedundancy blocks while 
asigning tasks to DNs after taking xmits into consideration after our previous 
discussion. 

{quote}
One more question, block replication looks at the number of xmits used on the 
DN to throttle appropriately. This doesn't work well with the C-DN scheme since 
the C-DN is rarely the source or target DN, and the work is sent in batches. 
Could you comment on this?
{quote}
Now with, HDFS-12570, we are giving priority to replication/ec tasks first. 
Remaining xmits will be used for SPS.  We can disable the configuration 
parameter(dfs.storage.policy.satisfier.low.max-streams.preference), if we want 
equal priority to SPS as well. by default its default, its true, we give low 
priority to SPS tasks than replication/EC.
We will post the latest updated design doc as well.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to