[
https://issues.apache.org/jira/browse/HBASE-18166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038258#comment-16038258
]
Stephen Yuan Jiang commented on HBASE-18166:
--------------------------------------------
[~stack], when I implemented the SplitTableRegionProcedure, I copied the logic
from SplitTransactionImpl.java:
{code}
/**
* Creates reference files for top and bottom half of the
* @param hstoreFilesToSplit map of store files to create half file
references for.
* @return the number of reference files that were created.
* @throws IOException
*/
private Pair<Integer, Integer> splitStoreFiles(
final Map<byte[], List<StoreFile>> hstoreFilesToSplit)
throws IOException {
if (hstoreFilesToSplit == null) {
// Could be null because close didn't succeed -- for now consider it fatal
throw new IOException("Close returned empty list of StoreFiles");
}
// The following code sets up a thread pool executor with as many slots as
// there's files to split. It then fires up everything, waits for
// completion and finally checks for any exception
int nbFiles = 0;
for (Map.Entry<byte[], List<StoreFile>> entry:
hstoreFilesToSplit.entrySet()) {
nbFiles += entry.getValue().size(); ===> possible to have reference
files
}
{code}
I just wonder whether we should change the logic in SplitTransactionImpl in
branch-1 to skip splitting reference files (I checked HRegion#doClose() and did
not see the logic to skip reference files in region server side).
> [AMv2] We are splitting already-split files
> -------------------------------------------
>
> Key: HBASE-18166
> URL: https://issues.apache.org/jira/browse/HBASE-18166
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.0.0
> Reporter: stack
> Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-18166.master.001.patch,
> HBASE-18166.master.002.patch
>
>
> Interesting issue. The below adds a lag cleaning up files after a compaction
> in case of on-going Scanners (for read replicas/offheap).
> HBASE-14970 Backport HBASE-13082 and its sub-jira to branch-1 - recommit (Ram)
> What the lag means is that now that split is run from the HMaster in master
> branch, when it goes to get a listing of the files to split, it can pick up
> files that are for archiving but that have not been archived yet. When it
> does, it goes ahead and splits them... making references of references.
> Its a mess.
> I added asking the Region if it is splittable a while back. The Master calls
> this from SplitTableRegionProcedure during preparation. If the RegionServer
> asked for the split, it is sort of redundant work given the RS asks itself if
> any references still; if any, it'll wait before asking for a split. But if a
> user/client asks, then this isSplittable over RPC comes in handy.
> I was thinking that isSplittable could return list of files....
> Or, easier, given we know a region is Splittable by the time we go to split
> the files, then I think master-side we can just skip any references found
> presuming read-for-archive.
> Will be back with a patch. Want to test on cluster first (Side-effect is
> regions are offline because file at end of the reference to a reference is
> removed ... and so the open fails).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)