[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

Uma Maheswara Rao G (JIRA) Wed, 16 Aug 2017 15:21:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129525#comment-16129525
 ]


Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------

Hi [~andrew.wang] Thank you for helping us a lot in reviews. Really great 
points.
{quote}
This would be a user periodically asking for status. From what I know of async 
API design, callbacks are preferred over polling since it solves the question 
about how long the server needs to hold the status.
I'd be open to any proposal here, I just think the current "isSpsRunning" API 
is insufficient. Did you end up filing a ticket to track this?
{quote}
ASYNC API design perspective, I agree, systems would have callback register 
APIs . I think we don't have that call back mechanism design's in place HDFS. 
In this particular case, we don't actually process anything for user is 
waiting, this is just a trigger to system to start some inbuilt functionality. 
In fact isSpsRunning API was added just for users to make sure inbuilt SPS is 
not running if they want to run Mover tool explicitly. I filed a JIRA 
HDFS-12310 to discuss more. I really don't know its a good idea to encourage 
users to periodically poll on the system for this status. IMO, if movements are 
really failing(probably some storages are unavailable or some storages failed 
etc), there is definitely an administrator actions required instead of user 
component knowing the status and taking actions itself. So, strongly believe 
reporting failures as metrics will definitely get into admins attention on the 
system. Since we don't want to enable it as auto movement at first stage, there 
should be come trigger to start the movement. Some work happening related to 
async HDFS API at HDFS-9924, probably we could take some design thoughts from 
there once they are in for status API? 
Also another argument is that, We already have async fashioned APIs, example 
delete or setReplication. Even for NN call perspective they may be sync calls, 
but for user perspective, still lot of work happens asynchronously. If we 
delete file, it does NN cleanup and add blocks for deletions. All the blocks 
deletions happens asynchronously. User believe HDFS that data will be cleaned, 
we don't have status reporting API. 
if we change the replication, we change it in NN and eventually replication 
will be triggered, I don't think users will poll on replication is done or not. 
As Its HDFS functionality to replicate, he just rely on it. If replications are 
failing, then definitely admin actions required to fix them.  Usually admins 
depends on fsck or metrics. Lets discuss more on that JIRA HDFS-12310?
At the end don't say we should not have status reporting.I feel that's a good 
to have requirement.
Do you have some use cases on how the application system(ex: Hbase, 
[~anoopsamjohn] has provided some useless above to use SPS) reacts on status 
results? 

{quote}
If I were to paraphrase, the NN is the ultimate arbiter, and the operations 
being performed by C-DNs are idempotent, so duplicate work gets dropped safely. 
I think this still makes it harder to reason about from a debugging POV, 
particularly if we want to extend this to something like EC conversion that 
might not be idempotent.
{quote}
Similar to C-DN way only we are doing reconstructions work in EC already. All 
block group blocks will be reconstructed at on DN. there also that node will be 
choses loosely. Here we just Named as C-DN and sending more blocks as logical 
batch(in this case all blocks associated to a file). In EC case, we are send a 
block group blocks. Coming to idempotent , even today we are just doing in 
idempotent way in EC-reconstruction. I feel we can definitely handle that 
cases, as conversion of while file should complete and then only we can convert 
contiguous blocks to stripe mode at NN. Whoever finish first that will be 
updated to NN. Once NN already converted the blocks, it should not accept newly 
converted block groups. But this should be anyway different discussion. I just 
wanted to pointed out another use case  HDFS-12090, I see that JIRA wants to 
adopt this model to move work.

{quote}
I like the idea of offloading work in the abstract, but I don't know how much 
work we really offload in this situation. The NN still needs to track 
everything at the file level, which is the same order of magnitude as the block 
level. The NN is still doing blockmanagement and processing IBRs for the block 
movement. Distributing tracking work to the C-DNs adds latency and makes the 
system more complicated.
{quote}
I don't see any extra latencies involved really. Anyway work has to be send to 
DNs individually. Along with that, we send batch to one DN first, that DN does 
its work as well as ask other DNs to transfer the blocks. Handling block level 
still keeps the requirement of tracking at files/directories level to make sure 
remove associated Xattrs. Block movement results will come anyway async fashion 
from DNs to NN back. To be simple: NN still sends the blocks, but it groups all 
the file related blocks as one batch. This way we just removed, block by block 
tracking at NN. 

 
In overall below are the key tasks we are working on :
1. Xattr optimization work HDFS-12225 ( PA )
2. Working on Recursive API support HDFS-12291, this should cover NN level 
throttling as well.

And some of the other minor review comments fixes are at HDFS-12214

We have filed a follow-up JIRA to track post merge issues at HDFS-12226



> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

Reply via email to