[jira] [Comment Edited] (HBASE-17992) The snapShot TimeoutException causes the cleanerChore thread to fail to complete the archive correctly

Bo Cui (JIRA) Mon, 08 May 2017 00:45:24 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000366#comment-16000366
 ]


Bo Cui edited comment on HBASE-17992 at 5/8/17 7:44 AM:
--------------------------------------------------------

whether the disabledTableSnapshot#exec needs to set  the total duration of 
waiting?
{code:title=DisabledTableSnapshotHandler.java|borderStyle=solid}
public void snapshotRegions(List<Pair<HRegionInfo, ServerName>> 
regionsAndLocations)
 throws IOException, KeeperException {
    ...
 ThreadPoolExecutor exec = SnapshotManifest.createExecutor(conf, 
"DisabledTableSnapshot");
      try {
        ModifyRegionUtils.editRegions(exec, regions, new 
ModifyRegionUtils.RegionEditTask() {
          @Override
          public void editRegion(final HRegionInfo regionInfo) throws 
IOException {
            snapshotManifest.addRegion(FSUtils.getTableDir(rootDir, 
snapshotTable), regionInfo);
          }
        });
        }catch(IOException e){
        exec.shutdownNow();
        while(!exec.isTerminated()){
          Thread.sleep(2000);
        }
        throw e;
       }
      exec.shutdown();
    ...
}
{code}

Snapshotmanifest#addregion() : read memory and write HDFS
Read memory -- does not take a long time
Write HDFS -- HDFS has its own timeout or exception handling
And exec defaults to eight threads, and if an exception occurs, only >= 8 
threads execute.
So I think there's no need set  the total duration of waiting, for ensure that 
all task ends.



was (Author: bo cui):
whether the disabledTableSnapshot#exec needs to set  the total duration of 
waiting?
{code:title=DisabledTableSnapshotHandler.java|borderStyle=solid}
public void snapshotRegions(List<Pair<HRegionInfo, ServerName>> 
regionsAndLocations)
 throws IOException, KeeperException {
    ...
 ThreadPoolExecutor exec = SnapshotManifest.createExecutor(conf, 
"DisabledTableSnapshot");
      try {
        ModifyRegionUtils.editRegions(exec, regions, new 
ModifyRegionUtils.RegionEditTask() {
          @Override
          public void editRegion(final HRegionInfo regionInfo) throws 
IOException {
            snapshotManifest.addRegion(FSUtils.getTableDir(rootDir, 
snapshotTable), regionInfo);
          }
        });
        }catch(IOException e){
        exec.shutdownNow();
        while(!exec.isTerminated()){
          Thread.sleep(2000);
        }
        throw e;
       }
      exec.shutdown();
    ...
}
{code}

Snapshotmanifest#addregion() : read memory and write HDFS
Read memory -- does not take a long time
Write HDFS -- HDFS has its own timeout or exception handling
And exec defaults to eight threads, and if an exception occurs, only >= 8 
threads execute.
So I think there's no need set  the total duration of waiting, and ensure that 
all task ends.


> The snapShot TimeoutException causes the cleanerChore thread to fail to 
> complete the archive correctly
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17992
>                 URL: https://issues.apache.org/jira/browse/HBASE-17992
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 0.98.10, 1.3.0
>            Reporter: Bo Cui
>         Attachments: hbase-17992.patch
>
>
> The problem is that when the snapshot occurs TimeoutException  or other 
> Exceptions, there is no correct delete /hbase/.hbase-snapshot/tmp, which 
> causes the cleanerChore to fail to complete the archive correctly.
> Modifying the configuration parameter (hbase.snapshot.master.timeout.millis = 
> 600000) only reduces the probability of the problem occurring.
> So the solution to the problem is: multi-Threaded exceptions or 
> TimeoutExceptions, the Main-thread must wait until all the tasks are finished 
> or canceled, the Main-thread can be cleared 
> /hbase/.hbase-snapshot/tmp/snapshotName.Otherwise the task is likely to write 
> /hbase/.hbase-snapshot/tmp/snapshotName/region - mainfest
> The problem exists in disabledTableSnapshot and enabledTableSnapshot, because 
> I'm currently using the disabledTableSnapshot, so I provide the patch of 
> disabledTableSnapshot



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HBASE-17992) The snapShot TimeoutException causes the cleanerChore thread to fail to complete the archive correctly

Reply via email to