[ 
https://issues.apache.org/jira/browse/HADOOP-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672158#comment-16672158
 ] 

Xiao Chen commented on HADOOP-11391:
------------------------------------

Thanks Hrishikesh for working on this. Thanks [~ejohansen] for reporting and 
Allen for commenting.

Patch makes sense to me. I have 1 question and a few review comments.

Question:
Did you try to verify if what's reported in the description is still happening 
on trunk? I don't recall which specific code would trigger it, but I feel the 
behavior is still the same as of now. Specifically, {{Mis-replicated blocks}} 
in description is still be 0 after a restart.

Review comments:
- BlockManager: we need to break out the writelock into finer-grained 
configurable batches since the input {{blocks}} could be huge. Probably ok to 
reuse {{numBlocksPerIteration}}.
- BlockManager: the log about {{processMisReplicatedBlocks}} should be debug or 
trace
- Would be good to have the test to verify changing from 'never been 
multi-rack' to multi-rack, and then -replicate.
- Test: trivial, let's be consistent about naming in the comment. 'datanodes' 
should be fine, don't need 'data nodes' .

> Enabling HVE/node awareness does not rebalance replicas on data that existed 
> prior to topology changes. 
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11391
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11391
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: VMWare w/ local storage
>            Reporter: ellen johansen
>            Assignee: Hrishikesh Gadre
>            Priority: Major
>         Attachments: HADOOP-11391-001.patch, HADOOP-11391-002.patch
>
>
> Enabling HVE/node awareness does not rebalance replicas on data that existed 
> prior to topology changes. 
> [root@vmw-d10-001 jenkins]# more /opt/cloudera/topology.data 
> 10.20.xxx.161   /rack1/nodegroup1
> 10.20.xxx.162   /rack1/nodegroup1
> 10.20.xxx.163   /rack3/nodegroup1
> 10.20.xxx.164   /rack3/nodegroup1
> 172.17.xxx.71   /rack2/nodegroup1
> 172.17.xxx.72   /rack2/nodegroup1
> before HVE:
> /user/impalauser/tpcds/store_sales <dir>
> /user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 
> block(s):  OK
> 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 10.20.xxx.163:20002]
> 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 
> len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 
> 10.20.xxx.161:20002]
> 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 
> len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 
> 10.20.xxx.163:20002]
> 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 
> len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 
> 10.20.xxx.163:20002]
> 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]
> 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 
> len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 
> 10.20.xxx.163:20002]
> 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 
> 10.20.xxx.163:20002]
> 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742222_1398 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 
> 10.20.xxx.161:20002]
> 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 
> len=106721297 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 
> 172.17.xxx.72:20002]
> ---------
> Before enabling HVE:
> Status: HEALTHY
>  Total size:  1648156454 B (Total open files size: 498 B)
>  Total dirs:  138
>  Total files: 384
>  Total symlinks:              0 (Files currently being written: 6)
>  Total blocks (validated):    390 (avg. block size 4226042 B) (Total open 
> file blocks (not validated): 6)
>  Minimally replicated blocks: 390 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     1 (0.25641027 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  3
>  Average block replication:   2.8564103
>  Corrupt blocks:              0
>  Missing replicas:            5 (0.44682753 %)
>  Number of data-nodes:                5
>  Number of racks:             1
> FSCK ended at Wed Dec 10 14:04:35 EST 2014 in 50 milliseconds
> The filesystem under path '/' is HEALTHY
> ------
> after HVE (and NN restart):
> /user/impalauser/tpcds/store_sales <dir>
> /user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 
> block(s):  OK
> 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 
> 10.20.xxx.161:20002]
> 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 
> len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.161:20002]
> 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 
> len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.163:20002]
> 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 
> len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.163:20002]
> 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 
> len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.161:20002]
> 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 
> len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.163:20002]
> 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 
> 10.20.xxx.162:20002]
> 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742222_1398 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 
> 10.20.xxx.161:20002]
> 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 
> len=106721297 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.162:20002]
> Status: HEALTHY
>  Total size:  1659427036 B (Total open files size: 498 B)
>  Total dirs:  176
>  Total files: 529
>  Total symlinks:              0 (Files currently being written: 6)
>  Total blocks (validated):    532 (avg. block size 3119223 B) (Total open 
> file blocks (not validated): 6)
>  Minimally replicated blocks: 532 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     1 (0.18796992 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  3
>  Average block replication:   2.8383458
>  Corrupt blocks:              0
>  Missing replicas:            7 (0.46143705 %)
>  Number of data-nodes:                5
>  Number of racks:             3
> FSCK ended at Wed Dec 10 14:29:23 EST 2014 in 115 milliseconds
> The filesystem under path '/' is HEALTHY
> ------------
> store sales pushed to hdfs after HVE was configured:
> /user/impalauser/tpcds_after_hve <dir>
> /user/impalauser/tpcds_after_hve/store_sales.dat 1180463121 bytes, 9 
> block(s):  OK
> 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743406_2582 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]
> 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743412_2588 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 
> 172.17.xxx.72:20002]
> 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743415_2591 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]
> 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743416_2592 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 
> 172.17.xxx.72:20002]
> 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743417_2593 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]
> 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743418_2594 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]
> 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743419_2595 
> len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]
> 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743422_2598 
> len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 
> 10.20.xxx.162:20002]
> 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743423_2599 
> len=106721297 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 
> 172.17.xxx.72:20002]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to