[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119451#comment-13119451 ]
jirapos...@reviews.apache.org commented on HBASE-4377: ------------------------------------------------------ bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > How long did it take to scan the cluster with 2700 inconsistencies ? bq. > I see certain places in the code where more parallelism can be achieved if practical use of this feature takes long time. The cluster that had 12k total regions after clenaup. It took 2m to run (this was localdisk accesses). I didn't feel that the runtime was something to be concerned about. And I honestly hope this code doesn't get used too often! We could use the same WorkItem trick to speed up the code but my feeling is that straightforward and correct is the right first step. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 374 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line374> bq. > bq. > Better replace root with -ROOT- done. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 376 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line376> bq. > bq. > b is not needed here, same with question mark. k bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 391 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line391> bq. > bq. > Please remove b and question mark. k bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 307 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line307> bq. > bq. > The .META. region is open upon return. bq. > I think we should document this. changed "live" to "open" bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 288 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line288> bq. > bq. > It would be nice to log the path for the underlying region. bq. > Otherwise what purpose does this catch/rethrow serve ? nice catch. Updated to include table name and path. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 309 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line309> bq. > bq. > Looking at the usage below, maybe createNewRootAndMeta would be a better name. done bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 352 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line352> bq. > bq. > This log doesn't match the check above. bq. > If we only produce Put for the first HbckInfo, we'd better declare that in the log. updated error message and change behavior so that it bails out. In this particular case, the invariant is checked before this method is called, but I'll just make it more explicit. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 356 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line356> bq. > bq. > This would produce exception if his.size() == 0. problem avoided with update. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 378 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line378> bq. > bq. > Do you plan to do this in the next patch or in another JIRA ? bq. > I haven't looked at the other JIRAs you mentioned, pardon me. I'll file it as a follow-on jira. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 428 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line428> bq. > bq. > Is there something we can do in case we get IOE from this call ? added error logging and an attempt to revert. bq. On 2011-09-30 21:27:16, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 377 bq. > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line377> bq. > bq. > Better use boolean for return value to indicate success/failure. done. - jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/#review2231 ----------------------------------------------------------- On 2011-09-30 00:02:16, jmhsieh wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2126/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-09-30 00:02:16) bq. bq. bq. Review request for hbase, Michael Stack and Andrew Purtell. bq. bq. bq. Summary bq. ------- bq. bq. commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 bq. Author: Jonathan Hsieh <j...@cloudera.com> bq. Date: Wed Sep 28 10:18:11 2011 -0700 bq. bq. HBASE-4377 [hbck] Offline rebuild .META. from fs data only bq. bq. bq. This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506. bq. bq. Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. bq. bq. bq. This addresses bug HBASE-4377. bq. https://issues.apache.org/jira/browse/HBASE-4377 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 bq. src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 bq. src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2126/diff bq. bq. bq. Testing bq. ------- bq. bq. An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster -- it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. bq. bq. Note also that this is not an automatic fix -- whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. bq. bq. The tests demonstrate current expected behavior -- rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. bq. bq. bq. Thanks, bq. bq. jmhsieh bq. bq. > [hbck] Offline rebuild .META. from fs data only. > ------------------------------------------------ > > Key: HBASE-4377 > URL: https://issues.apache.org/jira/browse/HBASE-4377 > Project: HBase > Issue Type: New Feature > Reporter: Jonathan Hsieh > Assignee: Jonathan Hsieh > Attachments: > 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch > > > In a worst case situation, it may be helpful to have an offline .META. > rebuilder that just looks at the file system's .regioninfos and rebuilds meta > from scratch. Users could move bad regions out until there is a clean > rebuild. > It would likely fill in region split holes. Follow on work could given > options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira