[
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162624#comment-16162624
]
Abhishek Singh Chouhan commented on HBASE-18771:
------------------------------------------------
Actually
bq. return (numRegionsAfterSplit == numRegionsBeforeSplit + 1 &&
admin.isTableAvailable(TEST_TABLE));
results in a false positive. Just before the daughter regions are opened,
admin.getTableRegions returns the daughter regions(which is
numRegionsBeforeSplit +1 since parent is offline and excluded) and
admin.isTableAvailable returns true even before the daughter regions are
actually opened. This is because isTableAvailable checks if it can
getServerName for the meta entries, which its able to for the daughter regions.
Server location are added to the meta entries in MetaTableAccessor#splitRegion
although the description of the method says "Does not add the location
information to the daughter regions since they are not open yet.". So basically
isTableAvailable gives us a false positive when the parent is offline and the
daughters are not yet open. I can open a bug for this, wdyt [~ashu210890]
[~apurtell] [~anoop.hbase]
> Incorrect StoreFileRefresh leading to split and compaction failures
> -------------------------------------------------------------------
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.3.1
> Reporter: Abhishek Singh Chouhan
> Assignee: Abhishek Singh Chouhan
> Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch,
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch,
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch,
> HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this
> point we now have 5 store files, however only 1(the newly formed) is open now
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe)
> being called which results in region.refreshStoreFiles(true) ->
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously
> compacted files back to the store, however these files are also present in
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs,
> checks compactedFiles list and moves these files into the archive directory.
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609]
> regionserver.CompactSplitThread - Compaction selection failed regionName =
> xxxx, storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://xxxx
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening
> daughter regions due to FNFE. This results in parent offline and daughters
> also in a limbo since they're unable to open. Since we get the error after
> PONR we also end up aborting the RS.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)