[ 
https://issues.apache.org/jira/browse/HBASE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057059#comment-14057059
 ] 

Hadoop QA commented on HBASE-11488:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12654917/HBASE-11488-master.patch
  against trunk revision .
  ATTACHMENT ID: 12654917

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.io.hfile.TestCacheConfig

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10010//console

This message is automatically generated.

> cancelTasks in SubprocedurePool can hang during task error
> ----------------------------------------------------------
>
>                 Key: HBASE-11488
>                 URL: https://issues.apache.org/jira/browse/HBASE-11488
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.96.1, 0.99.0, 0.98.3
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>         Attachments: HBASE-11488-master.patch
>
>
> During snapshot on the region server side, if one RegionSnapshotTask throws 
> exception, we will cancel other tasks.
> In 
> RegionServerSnapshotManager.SnapshotSubprocedurePool.waitForOutstandingTasks():
> {code}
>       LOG.debug("Waiting for local region snapshots to finish.");
>       int sz = futures.size();
>       try {
>         // Using the completion service to process the futures that finish 
> first first.
>         for (int i = 0; i < sz; i++) {
>           Future<Void> f = taskPool.take();
>           f.get();
>           if (!futures.remove(f)) {
>             LOG.warn("unexpected future" + f);
>           }
>           LOG.debug("Completed " + (i+1) + "/" + sz +  " local region 
> snapshots.");
>         }
>         LOG.debug("Completed " + sz +  " local region snapshots.");
>         return true;
>       } catch (InterruptedException e) {
>         LOG.warn("Got InterruptedException in SnapshotSubprocedurePool", e);
>         if (!stopped) {
>           Thread.currentThread().interrupt();
>           throw new ForeignException("SnapshotSubprocedurePool", e);
>         }
>         // we are stopped so we can just exit.
>       } catch (ExecutionException e) {
>         if (e.getCause() instanceof ForeignException) {
>           LOG.warn("Rethrowing ForeignException from 
> SnapshotSubprocedurePool", e);
>           throw (ForeignException)e.getCause();
>         }
>         LOG.warn("Got Exception in SnapshotSubprocedurePool", e);
>         throw new ForeignException(name, e.getCause());
>       } finally {
>         cancelTasks();
>       }
> {code}
> If  f.get() throws ExecutionException (for example, caused by 
> NotServingRegionException), we will call cancelTasks().
> In cancelTasks():
> {code}
>      ...
>      // evict remaining tasks and futures from taskPool.
>      while (!futures.isEmpty()) {
>         // block to remove cancelled futures;
>         LOG.warn("Removing cancelled elements from taskPool");
>         futures.remove(taskPool.take());
>       }
> {code}
> For example, suppose we have 3 tasks, the first one fails and we get an 
> exception when we do:
> {code}
>           Future<Void> f = taskPool.take();
>           f.get();
> {code}
> We didn't remove the 'f' from the 'futures' list yet, but we already take one 
> from taskPool.
> As a result, there are 3 in 'futures' list, but only 2 remain in taskPool.
> We'll block on taskPool.take() in the above cancelTasks() code.
> The end result is that the procedure will always fail timeout exception in 
> the end. 
> We could have bailed out earlier with the real cause.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to