[
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129525#comment-16129525
]
Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------
Hi [~andrew.wang] Thank you for helping us a lot in reviews. Really great
points.
{quote}
This would be a user periodically asking for status. From what I know of async
API design, callbacks are preferred over polling since it solves the question
about how long the server needs to hold the status.
I'd be open to any proposal here, I just think the current "isSpsRunning" API
is insufficient. Did you end up filing a ticket to track this?
{quote}
ASYNC API design perspective, I agree, systems would have callback register
APIs . I think we don't have that call back mechanism design's in place HDFS.
In this particular case, we don't actually process anything for user is
waiting, this is just a trigger to system to start some inbuilt functionality.
In fact isSpsRunning API was added just for users to make sure inbuilt SPS is
not running if they want to run Mover tool explicitly. I filed a JIRA
HDFS-12310 to discuss more. I really don't know its a good idea to encourage
users to periodically poll on the system for this status. IMO, if movements are
really failing(probably some storages are unavailable or some storages failed
etc), there is definitely an administrator actions required instead of user
component knowing the status and taking actions itself. So, strongly believe
reporting failures as metrics will definitely get into admins attention on the
system. Since we don't want to enable it as auto movement at first stage, there
should be come trigger to start the movement. Some work happening related to
async HDFS API at HDFS-9924, probably we could take some design thoughts from
there once they are in for status API?
Also another argument is that, We already have async fashioned APIs, example
delete or setReplication. Even for NN call perspective they may be sync calls,
but for user perspective, still lot of work happens asynchronously. If we
delete file, it does NN cleanup and add blocks for deletions. All the blocks
deletions happens asynchronously. User believe HDFS that data will be cleaned,
we don't have status reporting API.
if we change the replication, we change it in NN and eventually replication
will be triggered, I don't think users will poll on replication is done or not.
As Its HDFS functionality to replicate, he just rely on it. If replications are
failing, then definitely admin actions required to fix them. Usually admins
depends on fsck or metrics. Lets discuss more on that JIRA HDFS-12310?
At the end don't say we should not have status reporting.I feel that's a good
to have requirement.
Do you have some use cases on how the application system(ex: Hbase,
[~anoopsamjohn] has provided some useless above to use SPS) reacts on status
results?
{quote}
If I were to paraphrase, the NN is the ultimate arbiter, and the operations
being performed by C-DNs are idempotent, so duplicate work gets dropped safely.
I think this still makes it harder to reason about from a debugging POV,
particularly if we want to extend this to something like EC conversion that
might not be idempotent.
{quote}
Similar to C-DN way only we are doing reconstructions work in EC already. All
block group blocks will be reconstructed at on DN. there also that node will be
choses loosely. Here we just Named as C-DN and sending more blocks as logical
batch(in this case all blocks associated to a file). In EC case, we are send a
block group blocks. Coming to idempotent , even today we are just doing in
idempotent way in EC-reconstruction. I feel we can definitely handle that
cases, as conversion of while file should complete and then only we can convert
contiguous blocks to stripe mode at NN. Whoever finish first that will be
updated to NN. Once NN already converted the blocks, it should not accept newly
converted block groups. But this should be anyway different discussion. I just
wanted to pointed out another use case HDFS-12090, I see that JIRA wants to
adopt this model to move work.
{quote}
I like the idea of offloading work in the abstract, but I don't know how much
work we really offload in this situation. The NN still needs to track
everything at the file level, which is the same order of magnitude as the block
level. The NN is still doing blockmanagement and processing IBRs for the block
movement. Distributing tracking work to the C-DNs adds latency and makes the
system more complicated.
{quote}
I don't see any extra latencies involved really. Anyway work has to be send to
DNs individually. Along with that, we send batch to one DN first, that DN does
its work as well as ask other DNs to transfer the blocks. Handling block level
still keeps the requirement of tracking at files/directories level to make sure
remove associated Xattrs. Block movement results will come anyway async fashion
from DNs to NN back. To be simple: NN still sends the blocks, but it groups all
the file related blocks as one batch. This way we just removed, block by block
tracking at NN.
In overall below are the key tasks we are working on :
1. Xattr optimization work HDFS-12225 ( PA )
2. Working on Recursive API support HDFS-12291, this should cover NN level
throttling as well.
And some of the other minor review comments fixes are at HDFS-12214
We have filed a follow-up JIRA to track post merge issues at HDFS-12226
> Storage Policy Satisfier in Namenode
> ------------------------------------
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Affects Versions: HDFS-10285
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch,
> HDFS-10285-consolidated-merge-patch-01.patch,
> HDFS-SPS-TestReport-20170708.pdf,
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf,
> Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These
> policies can be set on directory/file to specify the user preference, where
> to store the physical block. When user set the storage policy before writing
> data, then the blocks could take advantage of storage policy preferences and
> stores physical block accordingly.
> If user set the storage policy after writing and completing the file, then
> the blocks would have been written with default storage policy (nothing but
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such
> file names as a list. In some distributed system scenarios (ex: HBase) it
> would be difficult to collect all the files and run the tool as different
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage
> policy file (inherited policy from parent directory) to another storage
> policy effected directory, it will not copy inherited storage policy from
> source. So it will take effect from destination file/dir parent storage
> policy. This rename operation is just a metadata change in Namenode. The
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for
> admins from distributed nodes(ex: region servers) and running the Mover tool.
> Here the proposal is to provide an API from Namenode itself for trigger the
> storage policy satisfaction. A Daemon thread inside Namenode should track
> such calls and process to DN as movement commands.
> Will post the detailed design thoughts document soon.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]