[
https://issues.apache.org/jira/browse/HBASE-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268553#comment-14268553
]
Enis Soztutar commented on HBASE-12791:
---------------------------------------
The changes in RegionStates for cleanup seems good.
For the HBCK change,
- We are doing some action without a parameter check (-fixAssignments,
-fixMeta, etc). Hbck running without any parameters should never do any
destructive action. Can we do this by adding a parameter, smt like
{{-fixFailedSplitAttempts}}.
- Can we save the results after we sort the regions for the table so that we
do not repeat the work for multiple regions in this state. In case there is a
large number of regions (100K) this will save some cycles.
Other than those it looks good.
> HBase does not attempt to clean up an aborted split when the regionserver
> shutting down
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-12791
> URL: https://issues.apache.org/jira/browse/HBASE-12791
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.98.0
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Critical
> Fix For: 2.0.0, 0.98.10, 1.0.1
>
> Attachments: HBASE-12791.patch, HBASE-12791_98.patch,
> HBASE-12791_branch1.patch, HBASE-12791_v2.patch, HBASE-12791_v3.patch
>
>
> HBase not cleaning the daughter region directories from HDFS if region
> server shut down after creating the daughter region directories during the
> split.
> Here the logs.
> -> RS shutdown after creating the daughter regions.
> {code}
> 2014-12-31 09:05:41,406 DEBUG [regionserver60020-splits-1419996941385]
> zookeeper.ZKAssign: regionserver:60020-0x14a9701e53100d1,
> quorum=localhost:2181, baseZNode=/hbase Transitioned node
> 80c665138d4fa32da4d792d8ed13206f from RS_ZK_REQUEST_REGION_SPLIT to
> RS_ZK_REQUEST_REGION_SPLIT
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385]
> regionserver.HRegion: Closing
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.: disabling compactions &
> flushes
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385]
> regionserver.HRegion: Updates disabled for region
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:41,516 INFO
> [StoreCloserThread-t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.-1]
> regionserver.HStore: Closed f
> 2014-12-31 09:05:41,518 INFO [regionserver60020-splits-1419996941385]
> regionserver.HRegion: Closed
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385]
> regionserver.MetricsRegionSourceImpl: Creating new MetricsRegionSourceImpl
> for table t dd9731ee43b104da565257ca1539aa8c
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385]
> regionserver.HRegion: Instantiated
> t,,1419996941401.dd9731ee43b104da565257ca1539aa8c.
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385]
> regionserver.MetricsRegionSourceImpl: Creating new MetricsRegionSourceImpl
> for table t 2e40a44511c0e187d357d651f13a1dab
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385]
> regionserver.HRegion: Instantiated
> t,row2,1419996941401.2e40a44511c0e187d357d651f13a1dab.
> Wed Dec 31 09:06:30 IST 2014 Terminating regionserver
> 2014-12-31 09:06:30,465 INFO [Thread-8] regionserver.ShutdownHook: Shutdown
> hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@42d2282e
> {code}
> -> Skipping rollback if RS stopped or stopping so we end up in dirty daughter
> regions in HDFS.
> {code}
> 2014-12-31 09:07:49,547 INFO [regionserver60020-splits-1419996941385]
> regionserver.SplitRequest: Skip rollback/cleanup of failed split of
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f. because server is stopped
> java.io.InterruptedIOException: Interrupted after 0 tries on 350
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:156)
> {code}
> Because of this hbck always showing inconsistencies.
> {code}
> ERROR: Region { meta => null, hdfs =>
> hdfs://localhost:9000/hbase/data/default/t/2e40a44511c0e187d357d651f13a1dab,
> deployed => } on HDFS, but not listed in hbase:meta or deployed on any
> region server
> ERROR: Region { meta => null, hdfs =>
> hdfs://localhost:9000/hbase/data/default/t/dd9731ee43b104da565257ca1539aa8c,
> deployed => } on HDFS, but not listed in hbase:meta or deployed on any
> region server
> {code}
> If we try to repair then we end up in overlap regions in hbase:meta. and both
> daughter regions and parent are online.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)