[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

Hadoop QA (JIRA) Mon, 27 Apr 2015 02:53:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513833#comment-14513833
 ]


Hadoop QA commented on HDFS-8204:
---------------------------------

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   5m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 167m 11s | Tests passed in hadoop-hdfs. 
|
| | | 213m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12728324/HDFS-8204.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 618ba70 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10407/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10407/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10407/console |


This message was automatically generated.

> Mover/Balancer should not schedule two replicas to the same DN
> --------------------------------------------------------------
>
>                 Key: HDFS-8204
>                 URL: https://issues.apache.org/jira/browse/HDFS-8204
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>            Reporter: Walter Su
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
> HDFS-8204.003.patch
>
>
> Balancer moves blocks between Datanode(Ver. <2.6 ).
> Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
> the new version(Ver. >=2.6) .
> function
> {code}
> class DBlock extends Locations<StorageGroup>
> DBlock.isLocatedOn(StorageGroup loc)
> {code}
> -is flawed, may causes 2 replicas ends in same node after running balance.-
> For example:
> We have 2 nodes. Each node has two storages.
> We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
> We have a block with ONE_SSD storage policy.
> The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
> Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
> Otherwise DN1 has 2 replicas.
> --------------
> UPDATE(Thanks [~szetszwo] for pointing it out):
> {color:red}
> This bug will *NOT* causes 2 replicas end in same node after running balance, 
> thanks to Datanode rejecting it. 
> {color}
> We see a lot of ERROR when running test.
> {code}
> 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
> host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
> src: /127.0.0.1:52532 dst: /127.0.0.1:59537
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
> state FINALIZED and thus cannot be created.
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:186)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
>     at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
>     at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
>     at java.lang.Thread.run(Thread.java:722)
> {code}
> The Balancer runs 5~20 times iterations in the test, before it exits.
> It's ineffecient.
> Balancer should not *schedule* it in the first place, even though it'll 
> failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

Reply via email to