[ https://issues.apache.org/jira/browse/PHOENIX-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057160#comment-14057160 ]
Vikas Vishwakarma commented on PHOENIX-998: ------------------------------------------- I saw similar issues with native HBase API based client loader. I could correlate it to GC pause in the RegionServers. Log trace given below: >From the Client logs a read batch failed on the RegionServer machine at >11:13:29: =========== [Wed Jul 09 11:13:29 GMT 2014 com.salesforce.hbase.opthreads.FetchScanRunTimeThread run SEVERE] FetchScanRunTimeThread: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions: Wed Jul 09 11:13:29 GMT 2014, org.apache.hadoop.hbase.client.RpcRetryingCaller@2c109a0a, java.net.SocketTimeoutException: Call to ---hostname masked--- failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[---hostname masked---] from RegionServer logs at the same time there two consecutive GC pause total 3.2 seconds duration followed by RpcServer.responder: asyncWrite failure: =========== 2014-07-09 11:13:21,372 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1552ms 2014-07-09 11:13:29,981 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1846ms 2014-07-09 11:13:30,040 WARN org.apache.hadoop.ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0 2014-07-09 11:13:30,041 WARN org.apache.hadoop.ipc.RpcServer: RpcServer.respondercallId: 30901 service: ClientService methodName: Scan size: 24 connection: 10.230.229.22:57115: output error 2014-07-09 11:13:30,042 INFO org.apache.hadoop.ipc.RpcServer: RpcServer.responder: asyncWrite > SocketTimeoutException under high concurrent write access to phoenix indexed > table > ---------------------------------------------------------------------------------- > > Key: PHOENIX-998 > URL: https://issues.apache.org/jira/browse/PHOENIX-998 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.0.0 > Environment: HBase 0.98.1-SNAPSHOT, Hadoop 2.3.0-cdh5.0.0 > Reporter: wangxianbin > Priority: Critical > > we have a small hbase cluster, which has one master, six slaves, we test > phoenix index concurrent write access performance with four write clients, > each client has 100 threads, each thread has one phoenix jdbc connection, and > we encounter SocketTimeoutException as follow, and it will retry for very > long time, how can i deal with such issue? > 2014-05-22 17:22:58,490 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t10] client.AsyncProcess: > #16016, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,436 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t6] client.AsyncProcess: > #16027, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,440 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t1] client.AsyncProcess: > #16013, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,449 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t7] client.AsyncProcess: > #16028, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,473 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t8] client.AsyncProcess: > #16020, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,494 INFO [htable-pool20-t13] client.AsyncProcess: > #16016, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: > java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 > remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, > tracking started Thu May 22 17:21:32 CST 2014, retrying after 20189 ms, > replay 1 ops. > 2014-05-22 17:23:02,439 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t4] client.AsyncProcess: > #16022, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:02,496 INFO [htable-pool20-t3] client.AsyncProcess: #16013, > table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: > java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 > remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, > tracking started Thu May 22 17:21:32 CST 2014, retrying after 20001 ms, > replay 1 ops. > 2014-05-22 17:23:02,496 INFO [htable-pool20-t16] client.AsyncProcess: > #16028, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: > java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 > remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, > tracking started Thu May 22 17:21:37 CST 2014, retrying after 20095 ms, > replay 1 ops. -- This message was sent by Atlassian JIRA (v6.2#6252)