[ https://issues.apache.org/jira/browse/PHOENIX-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067447#comment-14067447 ]
Vikas Vishwakarma commented on PHOENIX-998: ------------------------------------------- I was able to reproduce and identify this issue. It is related to LeaseExpiry in RegionServers. You will see something like this in the RegionServer logs when this issue occurs. There is some refactoring around leaseExpiry in hbase-0.98 and may have introduced some race condition (refer https://issues.apache.org/jira/browse/HBASE-8449) 2014-07-19 05:20:03,696 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 120 lease expired on region PB4_1,EXCHG71:TOPIC79:00000000,1405745722171.631771d92412b9744e342aeffa61b880. 2014-07-19 05:20:03,907 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 117 lease expired on region PB4_1,EXCHG41:TOPIC79:00000000,1405745722171.dd30b2115b87f2c1f8aadcefa52d28ee. 2014-07-19 05:20:04,008 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 122 lease expired on region ... 2014-07-19 05:20:04,008 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 119 lease expired on region PB4_1,EXCHG53:TOPIC79:00000000,1405745722171.a55923b2b74ed6a61d4cad2c6afb4dd0. 2014-07-19 05:20:21,420 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 146 lease expired on region PB4_1,EXCHG27:TOPIC79:00000000,1405745722170.89dcec9f29c6c36664dcca0b51bb477a. 2014-07-19 05:20:21,420 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 145 lease expired on region PB4_1,EXCHG25:TOPIC79:00000000,1405745722170.cda36ed3c880f0cdc95b0729306fdd56. > SocketTimeoutException under high concurrent write access to phoenix indexed > table > ---------------------------------------------------------------------------------- > > Key: PHOENIX-998 > URL: https://issues.apache.org/jira/browse/PHOENIX-998 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.0.0 > Environment: HBase 0.98.1-SNAPSHOT, Hadoop 2.3.0-cdh5.0.0 > Reporter: wangxianbin > Priority: Critical > > we have a small hbase cluster, which has one master, six slaves, we test > phoenix index concurrent write access performance with four write clients, > each client has 100 threads, each thread has one phoenix jdbc connection, and > we encounter SocketTimeoutException as follow, and it will retry for very > long time, how can i deal with such issue? > 2014-05-22 17:22:58,490 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t10] client.AsyncProcess: > #16016, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,436 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t6] client.AsyncProcess: > #16027, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,440 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t1] client.AsyncProcess: > #16013, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,449 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t7] client.AsyncProcess: > #16028, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,473 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t8] client.AsyncProcess: > #16020, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:00,494 INFO [htable-pool20-t13] client.AsyncProcess: > #16016, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: > java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 > remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, > tracking started Thu May 22 17:21:32 CST 2014, retrying after 20189 ms, > replay 1 ops. > 2014-05-22 17:23:02,439 INFO > [storm4.org,60020,1400750242045-index-writer--pool3-t4] client.AsyncProcess: > #16022, waiting for some tasks to finish. Expected max=0, tasksSent=13, > tasksDone=12, currentTasksDone=12, retries=11 hasError=false, > tableName=IPHOENIX10M > 2014-05-22 17:23:02,496 INFO [htable-pool20-t3] client.AsyncProcess: #16013, > table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: > java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 > remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, > tracking started Thu May 22 17:21:32 CST 2014, retrying after 20001 ms, > replay 1 ops. > 2014-05-22 17:23:02,496 INFO [htable-pool20-t16] client.AsyncProcess: > #16028, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: > java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 > remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, > tracking started Thu May 22 17:21:37 CST 2014, retrying after 20095 ms, > replay 1 ops. -- This message was sent by Atlassian JIRA (v6.2#6252)