[
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629369#comment-15629369
]
Ted Yu edited comment on HBASE-14417 at 11/2/16 5:53 PM:
---------------------------------------------------------
I observed this in the TestHRegionServerBulkLoad-output for the version (v11
and earlier) where bulk load marker is written directly to hbase:backup table
in postAppend hook:
{code}
2016-09-13 23:10:14,072 DEBUG [B.defaultRpcServer.handler=4,queue=0,port=35667]
ipc.CallRunner(112): B.defaultRpcServer.handler=4,queue=0,port=35667: callId:
10646 service: ClientService methodName: Scan size: 264 connection:
172.18.128.12:59780
org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock in 60000
ms. regionName=atomicBulkLoad,,1473808150804.6b6c67612b01bce3348c144b959b7f0e.,
server=cn012.l42scl.hortonworks.com,35667,1473808145352
at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7744)
at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7725)
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:7634)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2588)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2582)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2569)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33516)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2229)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:136)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
{code}
Here was the state of the BulkLoadHandler thread (stuck):
{code}
"RS:0;cn012:36301.append-pool9-t1" #453 prio=5 os_prio=0 tid=0x00007fc3945bb000
nid=0x18ec in Object.wait() [0x00007fc30dada000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1727)
- locked <0x0000000794750580> (a java.util.concurrent.atomic.AtomicLong)
at
org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1756)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:241)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:191)
- locked <0x0000000794750048> (a
org.apache.hadoop.hbase.client.BufferedMutatorImpl)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:949)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:569)
at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.writeBulkLoadDesc(BackupSystemTable.java:227)
at
org.apache.hadoop.hbase.backup.impl.BulkLoadHandler.postAppend(BulkLoadHandler.java:83)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.postAppend(FSHLog.java:1448)
{code}
Even increasing handler count didn't help:
{code}
diff --git a/hbase-server/src/test/resources/hbase-site.xml
b/hbase-server/src/test/resources/hbase-site.xml
index bca90a3..829fcc9 100644
--- a/hbase-server/src/test/resources/hbase-site.xml
+++ b/hbase-server/src/test/resources/hbase-site.xml
@@ -30,6 +30,10 @@
</description>
</property>
<property>
+ <name>hbase.backup.enable</name>
+ <value>true</value>
+ </property>
+ <property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
</property>
@@ -48,11 +52,11 @@
</property>
<property>
<name>hbase.regionserver.handler.count</name>
- <value>5</value>
+ <value>50</value>
</property>
<property>
{code}
Post v11, the data stored in zookeeper is temporary: once an incremental backup
is run for the table receiving bulk load, data in zookeeper would be stored for
the backup Id in the backup table and removed from zookeeper.
was (Author: [email protected]):
I observed this in the TestHRegionServerBulkLoad-output for the version (v11
and earlier) where bulk load marker is written directly to hbase:backup table
in postAppend hook:
{code}
2016-09-13 23:10:14,072 DEBUG [B.defaultRpcServer.handler=4,queue=0,port=35667]
ipc.CallRunner(112): B.defaultRpcServer.handler=4,queue=0,port=35667: callId:
10646 service: ClientService methodName: Scan size: 264 connection:
172.18.128.12:59780
org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock in 60000
ms. regionName=atomicBulkLoad,,1473808150804.6b6c67612b01bce3348c144b959b7f0e.,
server=cn012.l42scl.hortonworks.com,35667,1473808145352
at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7744)
at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7725)
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:7634)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2588)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2582)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2569)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33516)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2229)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:136)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
{code}
Here was the state of the BulkLoadHandler thread (stuck):
{code}
"RS:0;cn012:36301.append-pool9-t1" #453 prio=5 os_prio=0 tid=0x00007fc3945bb000
nid=0x18ec in Object.wait() [0x00007fc30dada000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1727)
- locked <0x0000000794750580> (a java.util.concurrent.atomic.AtomicLong)
at
org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1756)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:241)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:191)
- locked <0x0000000794750048> (a
org.apache.hadoop.hbase.client.BufferedMutatorImpl)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:949)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:569)
at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.writeBulkLoadDesc(BackupSystemTable.java:227)
at
org.apache.hadoop.hbase.backup.impl.BulkLoadHandler.postAppend(BulkLoadHandler.java:83)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.postAppend(FSHLog.java:1448)
{code}
Even increasing handler count didn't help:
{code}
diff --git a/hbase-server/src/test/resources/hbase-site.xml
b/hbase-server/src/test/resources/hbase-site.xml
index bca90a3..829fcc9 100644
--- a/hbase-server/src/test/resources/hbase-site.xml
+++ b/hbase-server/src/test/resources/hbase-site.xml
@@ -30,6 +30,10 @@
</description>
</property>
<property>
+ <name>hbase.backup.enable</name>
+ <value>true</value>
+ </property>
+ <property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
</property>
@@ -48,11 +52,11 @@
</property>
<property>
<name>hbase.regionserver.handler.count</name>
- <value>5</value>
+ <value>50</value>
</property>
<property>
{code}
Post v11, the data stored in zookeeper is temporary: once an incremental backup
is run for the table receiving bulk load, data in zookeeper would be stored for
the backup Id and removed from zookeeper.
> Incremental backup and bulk loading
> -----------------------------------
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
> Issue Type: New Feature
> Affects Versions: 2.0.0
> Reporter: Vladimir Rodionov
> Assignee: Ted Yu
> Priority: Critical
> Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt,
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt,
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading
> bypasses WALs for obvious reasons, breaking incremental backups. The only way
> to continue backups after bulk loading is to create new full backup of a
> table. This may not be feasible for customers who do bulk loading regularly
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)