[
https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372611#comment-14372611
]
zhangduo commented on HBASE-13308:
----------------------------------
This is our 'compactAndBlockUntilDone' method.
{code:title=TestEndToEndSplitTransaction.java}
public static void compactAndBlockUntilDone(Admin admin, HRegionServer rs,
byte[] regionName)
throws IOException, InterruptedException {
log("Compacting region: " + Bytes.toStringBinary(regionName));
admin.majorCompactRegion(regionName);
log("blocking until compaction is complete: " +
Bytes.toStringBinary(regionName));
Threads.sleepWithoutInterrupt(500);
while (rs.compactSplitThread.getCompactionQueueSize() > 0) {
Threads.sleep(50);
}
}
{code}
It uses the thread pool's workQueue size as condition. But
{code}
public static void main(String[] args) throws InterruptedException {
ThreadPoolExecutor pool =
new ThreadPoolExecutor(1, 1, 60, TimeUnit.SECONDS, new
LinkedBlockingQueue<Runnable>());
pool.execute(new Runnable() {
@Override
public void run() {
try {
Thread.currentThread().join();
} catch (InterruptedException e) {}
}
});
Thread.sleep(2000);
System.out.println(pool.getActiveCount());
System.out.println(pool.getQueue().size());
pool.shutdownNow();
}
{code}
The output is
{noformat}
1
0
{noformat}
A thread pool's queue size does not include the running tasks. So if there is
only one running compaction, then the compaction queue size will be zero...
So, it is not safe to use compaction queue size as condition.
> Fix flaky TestEndToEndSplitTransaction
> --------------------------------------
>
> Key: HBASE-13308
> URL: https://issues.apache.org/jira/browse/HBASE-13308
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: zhangduo
> Assignee: zhangduo
>
> https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/
> First, we split 'e9eb97847340ea7c6b9616d63d62a784' to
> 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.
> And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
> {noformat}
> 2015-03-21 03:58:46,970 INFO [Thread-191]
> regionserver.TestEndToEndSplitTransaction(399): Initiating region split
> for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> 2015-03-21 03:58:46,976 INFO
> [PriorityRpcServer.handler=7,queue=1,port=54177]
> regionserver.RSRpcServices(1596): Splitting
> testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> 2015-03-21 03:58:46,977 DEBUG
> [PriorityRpcServer.handler=7,queue=1,port=54177]
> regionserver.CompactSplitThread(259): Split requested for
> testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
> compaction_queue=(0:0), split_queue=1, merge_queue=0
> 2015-03-21 03:58:46,978 INFO [Thread-191]
> regionserver.TestEndToEndSplitTransaction(399): blocking until region is
> split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832]
> lock.ZKInterProcessLockBase(226): Acquired a lock for
> /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:541770000000002
> 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832]
> lock.ZKInterProcessLockBase(328): Released
> /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:541770000000002
> 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use
> global event loop group NioEventLoopGroup
> 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832]
> regionserver.SplitRequest(142): Split transaction journal:
> STARTED at 1426910326977
> {noformat}
> We can see that it failed without any error message.
> I think can only happen when the parent is not splittable or we can not find
> a splitrow.
> {noformat}
> 2015-03-21 03:58:47,019 INFO
> [RS:0;priapus:54177-shortCompactions-1426910324308]
> regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in
> family of
> testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
> into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is
> 12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
> 2015-03-21 03:58:47,019 INFO
> [RS:0;priapus:54177-shortCompactions-1426910324308]
> regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction:
> Request =
> regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
> storeName=family, fileCount=2, fileSize=25.5 K, priority=1,
> time=14542808784655186; duration=0sec
> 2015-03-21 03:58:47,020 DEBUG
> [RS:0;priapus:54177-shortCompactions-1426910324308]
> regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread
> Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
> {noformat}
> We can see that, the compaction was completed at 03:58:47,019, but split was
> started at 03:58:46,970 which is earlier.
> So we have a reference file and is not splittable.
> I think the problem is 'compactAndBlockUntilDone' is not reliable, it may
> return before the compaction complete.
> Will try to prepare a patch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)