[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235912#comment-13235912
]
[email protected] commented on HBASE-5128:
------------------------------------------------------
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554>
bq. >
bq. > Can we do this in the current JIRA ?
bq. >
bq. > Why do we need to reload for every type of fix ?
I'd rather do it in a follow on issue. Correctness first, then performance.
This patch is massive already.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 404
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line404>
bq. >
bq. > Should be 'what are online'
"get regions according to what is online on each RegionServer"
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 418
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line418>
bq. >
bq. > checkAndRestoreConsistency() would be a better name.
every other variable is fix* so I think it seems ok to keep this fix as well.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435>
bq. >
bq. > I think master.synchronousBalanceSwitch() is better candidate for
this action.
I agree, but since this method is only in the trunk/0.94 branches I'll file a
follow on issue for this.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 457
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line457>
bq. >
bq. > the trailing s of '.regioninfos' should be removed.
"Orphaned regions are regions without a .regioninfo file in them."
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 484
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line484>
bq. >
bq. > I don't see where the hf is closed.
good catch!
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 488
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line488>
bq. >
bq. > Should hfile be added to a list so that we can report them
collectively ?
bq. >
bq. > Currently user has to search the output of hbck.
bq. From my point of view it is easier to keep these all on separate lines so
we can grep the output. Adding word "orphan" to log message.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489>
bq. >
bq. > Shall we continue with the remaining HFiles ?
good point. changed break to continue.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 501
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line501>
bq. >
bq. > Help me understand this comparison:
bq. > are we shrinking the range here ?
Good catch!
The goal here is to indeed expand the region to cover the range of all the
hfiles.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 531
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line531>
bq. >
bq. > Should read 'If there are errors to be fixed'
* This method determines if there are table integrity errors in HDFS. If
* there are errors and the appropriate "fix" options are enabled, the method
* will first correct orphan regions making them into legit regiondirs, and
* then reload to merge potentially overlapping regions.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 567
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line567>
bq. >
bq. > Some assertion here for the declared state (no holes) ?
removed no orphans, no holes from comment - the overlap repairs could happen if
the hdfs hole fix options are off.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 655
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line655>
bq. >
bq. > This exception isn't used.
bq. > Do we need it ?
not needed and removed. I believe this is in the 0.90 version and a remnant of
porting back and forth between versions.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702>
bq. >
bq. > Can hbaseRoot.getFileSystem() be saved in a variable outside the
loop ?
The guard makes this only executed once per table. In the 0.90 version, the
way I got a TableInfo was via a method call to get the
HRegionInfo/HTableDescription and I actually checked for inconsistencies there
-- in 0.92+ there is only the .tableinfo file so this consistency check isn't
relevant (though there should be another .tableinfo checks specific for 0.92+
which I can file as a follow on.)
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 800
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line800>
bq. >
bq. > Please put this on line 734
done.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 924
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line924>
bq. >
bq. > rename() returns a boolean, should we check the return value ?
added check similar to the one in the following call to rename.
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 817
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line817>
bq. >
bq. > Why is tablesInfo declared again ?
removed
bq. On 2012-03-22 06:33:20, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 642
bq. > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line642>
bq. >
bq. > This exception isn't used.
bq. > Do we need it ?
Removed from here. Not needed in this version, but is used in 0.90 version.
- jmhsieh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6208
-----------------------------------------------------------
On 2012-03-21 23:24:13, jmhsieh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4280/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-21 23:24:13)
bq.
bq.
bq. Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This version is similar to the 0.90.x version posted a few months back,
but has a few new features and some minor differences.
bq.
bq. 1) No trackHTD method needed since we can read from the file system.
bq. 2) Added safeguards to prevent mega merges, and to isolate repairs to
particular tables.
bq. 3) Fixed comparator in HRegionInfo
bq. 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in
HRegionInfo comparator.
bq.
bq. I'll backport to 0.94/0.92 (which should be very similar) and update the
0.90 versions after this patch has mostly cleared.
bq.
bq. This version is not perfect (there are definitely cases not covered) but
it think it is worth trying to get this in so that future reviews are more
manageable.
bq.
bq.
bq. This addresses bug HBASE-5128.
bq. https://issues.apache.org/jira/browse/HBASE-5128
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
d47ef10
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c
bq. src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
d57bb6b
bq.
src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
PRE-CREATION
bq.
src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02
bq. src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d
bq. src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
0599da1
bq. src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
dbb97f8
bq.
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
2b4cac8
bq.
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
ebbeead
bq.
src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
b175548
bq.
bq. Diff: https://reviews.apache.org/r/4280/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Unit tests cover many many situations and pass. Most "live" testing has
been done on 0.90.x versions. Many improvements and features added from
experience. Not much testing live on the trunk versions.
bq.
bq.
bq. Thanks,
bq.
bq. jmhsieh
bq.
bq.
> [uber hbck] Enable hbck to automatically repair table integrity problems as
> well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5128
> URL: https://issues.apache.org/jira/browse/HBASE-5128
> Project: HBase
> Issue Type: New Feature
> Components: hbck
> Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch,
> hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch,
> hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region
> consistency and table integrity invariant violations. However with '-fix' it
> can only automatically repair region consistency cases having to do with
> deployment problems. This updated version should be able to handle all cases
> (including a new orphan regiondir case). When complete will likely deprecate
> the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the
> file).
> {code}
> /**
> * HBaseFsck (hbck) is a tool for checking and repairing region consistency
> and
> * table integrity.
> *
> * Region consistency checks verify that META, region deployment on
> * region servers and the state of data in HDFS (.regioninfo files) all are in
> * accordance.
> *
> * Table integrity checks verify that that all possible row keys can resolve
> to
> * exactly one region of a table. This means there are no individual
> degenerate
> * or backwards regions; no holes between regions; and that there no
> overlapping
> * regions.
> *
> * The general repair strategy works in these steps.
> * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
> * 2) Repair Region Consistency with META and assignments
> *
> * For table integrity repairs, the tables their region directories are
> scanned
> * for .regioninfo files. Each table's integrity is then verified. If there
> * are any orphan regions (regions with no .regioninfo files), or holes, new
> * regions are fabricated. Backwards regions are sidelined as well as empty
> * degenerate (endkey==startkey) regions. If there are any overlapping
> regions,
> * a new region is created and all data is merged into the new region.
> *
> * Table integrity repairs deal solely with HDFS and can be done offline --
> the
> * hbase region servers or master do not need to be running. These phase can
> be
> * use to completely reconstruct the META table in an offline fashion.
> *
> * Region consistency requires three conditions -- 1) valid .regioninfo file
> * present in an hdfs region dir, 2) valid row with .regioninfo data in META,
> * and 3) a region is deployed only at the regionserver that is was assigned
> to.
> *
> * Region consistency requires hbck to contact the HBase master and region
> * servers, so the connect() must first be called successfully. Much of the
> * region consistency information is transient and less risky to repair.
> */
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira