Might want to see if Nagles algorithm is cause. See the tcpNoDelay options in hadoop RPC. I ran into a similar issue a while back
On Thu, May 14, 2009 at 5:44 PM, Jim Kellerman (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/HBASE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709648#action_12709648] > > Jim Kellerman commented on HBASE-1177: > -------------------------------------- > > When I did not deploy a region server on the same machine as the namenode > and master, I got much better times: > > {code} > When run on node hosting region that is not the same as namenode or master: > > Read 1 row with 7 columns 100 times in 113ms > Read 1 row with 8 columns 100 times in 4057ms > Read 1 row with 1000 columns 100 times in 10570ms > > When run on node not hosting region and is not namenode or master: > > Read 1 row with 7 columns 100 times in 109ms > Read 1 row with 8 columns 100 times in 121ms > Read 1 row with 1000 columns 100 times in 8838ms > {code} > > So running an application on the same server as the namenode. master and > region server definately slows things down. > > There is basically no difference in the test where we read 1 row with 7 > columns. > > Why the read 1 row with 8 columns or the read 1 row with 1000 columns is > still slower if the application is run on a different machine than the one > hosting the region, is still a mystery. > > Will continue investigation....... > > > Delay when client is located on the same node as the regionserver > > ----------------------------------------------------------------- > > > > Key: HBASE-1177 > > URL: https://issues.apache.org/jira/browse/HBASE-1177 > > Project: Hadoop HBase > > Issue Type: Bug > > Affects Versions: 0.19.0 > > Environment: Linux 2.6.25 x86_64 > > Reporter: Jonathan Gray > > Assignee: Jim Kellerman > > Priority: Blocker > > Fix For: 0.20.0 > > > > Attachments: ReadDelayTest.java > > > > > > During testing of HBASE-80, we uncovered a strange 40ms delay for random > reads. We ran a series of tests and found that it only happens when the > client is on the same node as the RS and for a certain range of payloads > (not specifically related to number of columns or size of them, only total > payload). It appears to be precisely 40ms every time. > > Unsure if this is particular to our architecture, but it does happen on > all nodes we've tried. Issue completely goes away with very large payloads > or moving the client. > > Will post a test program tomorrow if anyone can test on a different > architecture. > > Making a blocker for 0.20. Since this happens when you have an MR task > running local to the RS, and this is what we try to do, might also consider > making this a blocker for 0.19.1. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
