[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275034#comment-16275034
 ] 

Andrew Wang commented on HDFS-10285:
------------------------------------

Hi Anu, thanks for the prompt responses,

bq. Yes, [ZK] would be the simplest approach to getting SPS HA.

Could you describe this plan in more detail? ZK doesn't solve the problems of 
HA by itself. We still need to think about idempotency. Does it require ZKFCs? 
I want to emphasize again the operational complexity that comes from adding 
more daemons and processes. It's a big knock on the ease of use of HDFS right 
now.

All of this adds significant complexity to deploying this feature. Adding 
another ZK dependency to HDFS is also undesirable from my POV. ZK is used 
instead of QJM for NN leader election for legacy reasons. It'd be better to 
drop the ZK dependency from HDFS entirely.

bq. Once the active knows it is the leader, it can read the state from NN and 
continue. The issues of continuity are exactly same whether it is inside NN or 
outside.

Does this involve rescanning a significant portion of the namespace? 
Synchronizing state over an RPC boundary (which can fail) is also more 
complicated than going in-memory. We've also already got mechanisms in place 
for safely synchronizing namespace and block state between NNs.

bq. As soon as a block is moved, the move call updates the status of the block 
move, that is NN is up to date with that info. Each time there is a call to SPS 
API, NN will keep track of it and the updates after move lets us filter the 
remaining blocks.

Is an edit log update on every block move? That would be a lot of overhead, 
particularly since we don't persist block locations in HDFS right now.

bq. By that argument, Balancer should be the first tool that move into the 
Namenode and then DiskBalancer. Right now, SPS approach follows what we are 
doing in HDFS world, that is block moves are achieved thru an async mechanism. 
If you would like to provide a generic block mover mechanism in Namenode and 
then port balancer and diskBalancer, you are most welcome. I will be glad to 
move SPS to that framework when we have it.

The existing code being bad isn't a good reason to make it worse. I remember 
that the original motivation for the SPS was to reduce the deployment and 
operational complexity of running the balancer and mover. Making it a separate 
process again means we lose those benefits.

bq. There are a couple of concerns: <snip>

I don't agree with #1 for the reason stated above. The DiskBalancer is fine 
since it's local to one DN, but the Balancer and Mover circumventing global 
coordination is an anti-pattern IMO.

Regarding #2, in my previous comment, I provided a number of tasks that are 
performed by the SPS-in-NN. Could you point to which of these are offloaded 
from the NN by having the SPS as a separate service? Even a separate-service 
SPS still adds NN memory and CPU overhead. Also, as I said in my previous 
comment, marshalling and unmarshalling over an RPC interface is less efficient 
than scanning these NN data structures in-process.

#3, I don't follow how SSM or provided block storage benefit from SPS as a 
service vs. being part of the NN. If there are design docs for these 
interactions, I would appreciate some references.

bq. And most important, we are just accelerating an SPS future work item, it 
has been a booked plan to make SPS separate,

Where is this plan described and motivated? The design doc from last month 
talks about the SPS as a daemon thread in the NN.

It'd help to write up a more detailed design doc for review by the watchers on 
this JIRA. Making it a new service sounds like a big effort on top of what has 
already worked on.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to