[ 
https://issues.apache.org/jira/browse/HBASE-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6254:
----------------------------------

    Description: 
Execution of Deletes constructed with thousands of calls to 
Delete.deleteColumn(family, qualifier) are very expensive and slow.

On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete 
(as measured by client).

When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), 
one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized 
for about 1 hour.

This lead to the client timing out after 20min (2min x 10 retries).  In one 
case, the client was able to fill the RPC callqueue and received the following 
error:
{code}
  Failed all from region=<region>,hostname=<host>, port=<port> 
java.util.concurrent.ExecutionException: java.io.IOException: Call queue is 
full, is ipc.server.max.callqueue.size too small?
{code}
Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to 
Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue 
retrieved from scan based on domain objects.  This version of the delete ran in 
about 500ms.

User group thread titled "RS unresponsive after series of deletes" has related 
logs and stacktraces.  

Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP

Here is the stack dump of region server: http://pastebin.com/8y5x4xU7

  was:
Execution of Deletes constructed with thousands of calls to 
Delete.deleteColumn(family, qualifier) are very expensive and slow.

On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete 
(as measured by client).

When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), 
one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized 
for about 1 hour.

This lead to the client timing out after 20min (2min x 10 retries).  In one 
case, the client was able to fill the RPC callqueue and received the following 
error:

  Failed all from region=<region>,hostname=<host>, port=<port> 
java.util.concurrent.ExecutionException: java.io.IOException: Call queue is 
full, is ipc.server.max.callqueue.size too small?

Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to 
Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue 
retrieved from scan based on domain objects.  This version of the delete ran in 
about 500ms.

User group thread titled "RS unresponsive after series of deletes" has related 
logs and stacktraces.  

Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP

    
> deletes w/ many column qualifiers overwhelm Region Server
> ---------------------------------------------------------
>
>                 Key: HBASE-6254
>                 URL: https://issues.apache.org/jira/browse/HBASE-6254
>             Project: HBase
>          Issue Type: Bug
>          Components: performance, regionserver
>    Affects Versions: 0.94.0
>         Environment: 5 node Cent OS + 1 master, v0.94 on cdh3u3
>            Reporter: Ted Tuttle
>
> Execution of Deletes constructed with thousands of calls to 
> Delete.deleteColumn(family, qualifier) are very expensive and slow.
> On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete 
> (as measured by client).
> When 10 such Deletes were sent to the cluster via 
> HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the 
> requests and became 100% CPU utilized for about 1 hour.
> This lead to the client timing out after 20min (2min x 10 retries).  In one 
> case, the client was able to fill the RPC callqueue and received the 
> following error:
> {code}
>   Failed all from region=<region>,hostname=<host>, port=<port> 
> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is 
> full, is ipc.server.max.callqueue.size too small?
> {code}
> Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to 
> Delete.deleteColumn(family, qual, timestamp) where timestamp came from 
> KeyValue retrieved from scan based on domain objects.  This version of the 
> delete ran in about 500ms.
> User group thread titled "RS unresponsive after series of deletes" has 
> related logs and stacktraces.  
> Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP
> Here is the stack dump of region server: http://pastebin.com/8y5x4xU7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to