[ 
https://issues.apache.org/jira/browse/HBASE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037213#comment-14037213
 ] 

Qiang Tian commented on HBASE-11306:
------------------------------------

bq. ...Do we know for sure how the above scenario comes about?
do not know how at the moment. we are stuck in reading rpc request data or 
connection header. 

bq. ... It is connection from client kept-alive?
looks not. there is code to handle it. see PING_CALL_ID

bq.  ...OS is keeping it alive for two hours?
it looks for most systems the default timeout is 2 hours. but hbase client 
socket has much smaller timeout,  which will be triggered first.

bq. ...Have you done any debug
mostly code analysis based on the 2 occurrences we see in mailing list. I 
thought it is hard to repro. now Andrew can repro it, I will look if I could 
(know nothing about ycsb and ec2 at the moment :-))

bq. ...Not sure why the client gets stuck with only one stalled connection.
I suspect it is related to the original 11277 issue.  looks the 11277 fix 
leaves the bad connection there. at that point client is probably waiting for 
rpc response and might not send data again. and server side cleanupConnections 
will not cleanup the connection either.

read more rpc client code.  it looks client sharing the same connection for 
each regionserver is not accurate. a connection is identified by 3 tuples: 
user, RPC method name and server addr.(see getConnection and 
tracedWriteRequest). so different method call uses different 
connections(although the same rpcClient instance and same 
HConnectionImplementation instance).  please correct me if I am wrong. will dig 
more.

debug:
1)from client concurrency aspect,  could we try using different connections for 
different threads? --create new config objects in each YCSB thread(as Jonathan 
mentioned in a recent thread 
http://search-hadoop.com/m/DHED4zrOq61/HBase+with+multiple+threads&subj=+Discuss+HBase+with+multiple+threads)

2)from server side, I plan to add the timeout(hopefully tomorrow). we can dump 
diag(such as the phase we are at, some members in Connection class) when read 
count is 0 and timeout.

3)another suspect when I did search today --Reader#readSelector. should it be 
volatile? it is possible that we got stale data and fall into the read 
incorrectly?





> Client connection starvation issues under high load on Amazon EC2
> -----------------------------------------------------------------
>
>                 Key: HBASE-11306
>                 URL: https://issues.apache.org/jira/browse/HBASE-11306
>             Project: HBase
>          Issue Type: Bug
>         Environment: Amazon EC2
>            Reporter: Andrew Purtell
>
> I am using YCSB 0.1.4 with Hadoop 2.2.0 and HBase 0.98.3 RC2 on an EC2 
> testbed (c3.8xlarge instances, SSD backed, 10 GigE networking). There are 
> five slaves and five separate clients. I start with a prepopulated table of 
> 100M rows over ~20 regions and run 5 YCSB clients concurrently targeting 
> 250,000 ops/sec in aggregate. (Can reproduce this less effectively at 
> 100k/ops/sec aggregate also.) Workload A. Due to how I set up the test, the 
> data is all in one HFile per region and very likely in cache. All writes will 
> fit in the aggregate memstore. No flushes or compactions are observed on any 
> server during the test, only the occasional log roll. Despite these favorable 
> conditions developed over time to isolate this issue, a few of the clients 
> will stop making progress until socket timeouts after 60 seconds, leading to 
> very large op latency outliers. With the above detail plus some added extra 
> logging we can rule out storage layer effects. Turning to the network, this 
> is where things get interesting.
> I used {{while true ; do clear ; ss -a -o|grep ESTAB|grep 8120 ; sleep 5 ; 
> done}} (8120 is the configured RS data port) to watch receive and send socket 
> queues and TCP level timers on all of the clients and servers simultaneously 
> during the run. 
> I have Nagle disabled on the clients and servers and JVM networking set up to 
> use IPv4 only. The YCSB clients are configured to use 20 threads. These 
> threads are expected to share 5 active connections. one to each RegionServer. 
> When the test starts we see exactly what we'd expect, 5 established TCPv4 
> connections.
> On all servers usually the recv and send queues were empty when sampled. I 
> never saw more than 10K waiting. The servers occasionally retransmitted, but 
> with timers ~200ms and retry counts ~0.
> The client side is another story. We see serious problems like:
> {noformat}
> tcp    ESTAB      0      8733   10.220.15.45:41428   10.220.2.115:8120     
> timer:(on,38sec,7)
> {noformat}
> That is about 9K of data still waiting to be sent after 7 TCP level 
> retransmissions. 
> There is some unfair queueing and packet drops happening at the network 
> level, but we should be handling this better.
> During the periods when YCSB is not making progress, there is only that one 
> connection to one RS in established state. There should be 5 established 
> connections, one to each RS, but the other 4 have been dropped somehow. The 
> one distressed connection remains established for the duration of the 
> problem, while the retransmission timer count on the connection ticks upward. 
> It is dropped once the socket times out at the app level. Why are the 
> connections to the other RegionServers dropped? Why are all threads blocked 
> waiting on the one connection for the socket timeout interval (60 seconds)? 
> After the socket timeout we see the stuck connection dropped and 5 new 
> connections immediately established. YCSB doesn't do anything that would lead 
> to this behavior, it is using separate HTable instances for each client 
> thread and not closing the table references until test cleanup. These 
> behaviors seem internal to the HBase client. 
> Is maintaining only a single multiplexed connection to each RegionServer the 
> best approach? 
> A related issue is we collect zombie sockets in ESTABLISHED state on the 
> server. Also likely not our fault per se. Keepalives are enabled so they will 
> eventually be garbage collected by the OS. On Linux systems this will take 2 
> hours. We might want to drop connections where we don't see activity sooner 
> than that. Before HBASE-11277 we were spinning indefinitely on a core for 
> each connection in this state.
> I have tried this using a narrow range of recent Java 7 and Java 8 runtimes 
> and they all produce the same results. I have also launched several separate 
> EC2 based test clusters and they all produce the same results, so this is a 
> generic platform issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to