[ https://issues.apache.org/jira/browse/HADOOP-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672158#comment-16672158 ]
Xiao Chen commented on HADOOP-11391: ------------------------------------ Thanks Hrishikesh for working on this. Thanks [~ejohansen] for reporting and Allen for commenting. Patch makes sense to me. I have 1 question and a few review comments. Question: Did you try to verify if what's reported in the description is still happening on trunk? I don't recall which specific code would trigger it, but I feel the behavior is still the same as of now. Specifically, {{Mis-replicated blocks}} in description is still be 0 after a restart. Review comments: - BlockManager: we need to break out the writelock into finer-grained configurable batches since the input {{blocks}} could be huge. Probably ok to reuse {{numBlocksPerIteration}}. - BlockManager: the log about {{processMisReplicatedBlocks}} should be debug or trace - Would be good to have the test to verify changing from 'never been multi-rack' to multi-rack, and then -replicate. - Test: trivial, let's be consistent about naming in the comment. 'datanodes' should be fine, don't need 'data nodes' . > Enabling HVE/node awareness does not rebalance replicas on data that existed > prior to topology changes. > -------------------------------------------------------------------------------------------------------- > > Key: HADOOP-11391 > URL: https://issues.apache.org/jira/browse/HADOOP-11391 > Project: Hadoop Common > Issue Type: Bug > Environment: VMWare w/ local storage > Reporter: ellen johansen > Assignee: Hrishikesh Gadre > Priority: Major > Attachments: HADOOP-11391-001.patch, HADOOP-11391-002.patch > > > Enabling HVE/node awareness does not rebalance replicas on data that existed > prior to topology changes. > [root@vmw-d10-001 jenkins]# more /opt/cloudera/topology.data > 10.20.xxx.161 /rack1/nodegroup1 > 10.20.xxx.162 /rack1/nodegroup1 > 10.20.xxx.163 /rack3/nodegroup1 > 10.20.xxx.164 /rack3/nodegroup1 > 172.17.xxx.71 /rack2/nodegroup1 > 172.17.xxx.72 /rack2/nodegroup1 > before HVE: > /user/impalauser/tpcds/store_sales <dir> > /user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 > block(s): OK > 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 10.20.xxx.163:20002] > 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 > len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, > 10.20.xxx.161:20002] > 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 > len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, > 10.20.xxx.163:20002] > 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 > len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, > 10.20.xxx.163:20002] > 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] > 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 > len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, > 10.20.xxx.163:20002] > 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, > 10.20.xxx.163:20002] > 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742222_1398 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, > 10.20.xxx.161:20002] > 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 > len=106721297 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, > 172.17.xxx.72:20002] > --------- > Before enabling HVE: > Status: HEALTHY > Total size: 1648156454 B (Total open files size: 498 B) > Total dirs: 138 > Total files: 384 > Total symlinks: 0 (Files currently being written: 6) > Total blocks (validated): 390 (avg. block size 4226042 B) (Total open > file blocks (not validated): 6) > Minimally replicated blocks: 390 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 1 (0.25641027 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 2.8564103 > Corrupt blocks: 0 > Missing replicas: 5 (0.44682753 %) > Number of data-nodes: 5 > Number of racks: 1 > FSCK ended at Wed Dec 10 14:04:35 EST 2014 in 50 milliseconds > The filesystem under path '/' is HEALTHY > ------ > after HVE (and NN restart): > /user/impalauser/tpcds/store_sales <dir> > /user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 > block(s): OK > 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, > 10.20.xxx.161:20002] > 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 > len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.161:20002] > 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 > len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.163:20002] > 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 > len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.163:20002] > 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 > len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.161:20002] > 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 > len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.163:20002] > 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, > 10.20.xxx.162:20002] > 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742222_1398 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, > 10.20.xxx.161:20002] > 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 > len=106721297 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.162:20002] > Status: HEALTHY > Total size: 1659427036 B (Total open files size: 498 B) > Total dirs: 176 > Total files: 529 > Total symlinks: 0 (Files currently being written: 6) > Total blocks (validated): 532 (avg. block size 3119223 B) (Total open > file blocks (not validated): 6) > Minimally replicated blocks: 532 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 1 (0.18796992 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 2.8383458 > Corrupt blocks: 0 > Missing replicas: 7 (0.46143705 %) > Number of data-nodes: 5 > Number of racks: 3 > FSCK ended at Wed Dec 10 14:29:23 EST 2014 in 115 milliseconds > The filesystem under path '/' is HEALTHY > ------------ > store sales pushed to hdfs after HVE was configured: > /user/impalauser/tpcds_after_hve <dir> > /user/impalauser/tpcds_after_hve/store_sales.dat 1180463121 bytes, 9 > block(s): OK > 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743406_2582 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] > 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743412_2588 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, > 172.17.xxx.72:20002] > 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743415_2591 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] > 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743416_2592 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, > 172.17.xxx.72:20002] > 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743417_2593 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] > 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743418_2594 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] > 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743419_2595 > len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] > 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743422_2598 > len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, > 10.20.xxx.162:20002] > 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073743423_2599 > len=106721297 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, > 172.17.xxx.72:20002] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org