[
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517930#comment-14517930
]
Tsz Wo Nicholas Sze commented on HDFS-8204:
-------------------------------------------
> Actually, It does support. ...
You are right. Thanks for point it out.
> Mover/Balancer should not schedule two replicas to the same DN
> --------------------------------------------------------------
>
> Key: HDFS-8204
> URL: https://issues.apache.org/jira/browse/HDFS-8204
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer & mover
> Reporter: Walter Su
> Assignee: Walter Su
> Priority: Minor
> Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch,
> HDFS-8204.003.patch
>
>
> Balancer moves blocks between Datanode(Ver. <2.6 ).
> Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in
> the new version(Ver. >=2.6) .
> function
> {code}
> class DBlock extends Locations<StorageGroup>
> DBlock.isLocatedOn(StorageGroup loc)
> {code}
> -is flawed, may causes 2 replicas ends in same node after running balance.-
> For example:
> We have 2 nodes. Each node has two storages.
> We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
> We have a block with ONE_SSD storage policy.
> The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
> Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
> Otherwise DN1 has 2 replicas.
> --------------
> UPDATE(Thanks [~szetszwo] for pointing it out):
> {color:red}
> This bug will *NOT* causes 2 replicas end in same node after running balance,
> thanks to Datanode rejecting it.
> {color}
> We see a lot of ERROR when running test.
> {code}
> 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) -
> host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation
> src: /127.0.0.1:52532 dst: /127.0.0.1:59537
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
> BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in
> state FINALIZED and thus cannot be created.
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:186)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
> at java.lang.Thread.run(Thread.java:722)
> {code}
> The Balancer runs 5~20 times iterations in the test, before it exits.
> It's ineffecient.
> Balancer should not *schedule* it in the first place, even though it'll
> failed anyway. In the test, it should exit after 5 times iteration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)