[ 
https://issues.apache.org/jira/browse/HADOOP-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-6308:
--------------------------------

    Status: Open  (was: Patch Available)

patch appears to be out of date

> make number of IPC accepts configurable
> ---------------------------------------
>
>                 Key: HADOOP-6308
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6308
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.20.0
>         Environment: Linux, running Yahoo-based 0.20
>            Reporter: Andrew Ryan
>         Attachments: HADOOP-6308.patch
>
>
> We were recently seeing issues in our environments where HDFS clients would 
> experience RST's from the NN when trying to do RPC to get file info, which 
> would cause the task to fatal out. After some debugging we identified this to 
> be that the IPC server listen queue -- ipc.server.listen.queue.size -- was 
> far too low, we had been using the default value of 128 and found we needed 
> to bump it up to 10240 before resets went away (although this value is a bit 
> suspect, as I will explain later in the issue).
> When a large map job starts, lots of clients very quickly start to issue RPC 
> requests to the namenode, which creates this listen queue filling up problem, 
> because clients are opening connections faster than Hadoop's RPC server can 
> process them. We went back to our 0.17 cluster and instrumented that with 
> tcpdump and found that we had been sending RST's for a long time there, but 
> the retry handling was implemented differently back in 0.17 so a single TCP 
> failure wasn't task-fatal.
> In our environment we have our TCP stack set to explicitly send resets when 
> the listen queue gets overflowed (syctl net.ipv4.tcp_abort_on_overflow = 1), 
> default linux behavior is to start dropping SYN packets and let the client 
> retransmit. Other people may be experiencing this issue and not noticing it 
> because they are using the default behavior, which is to let the NN drop 
> packets on the floor and let clients retransmit.
> So we've identified (at least) 3 improvements that can be made here:
> 1) In src/core/org/apache/hadoop/ipc/Server.java, Listener.doAccept() is 
> currently hardcoded to do 10 accept()'s at a time, then it will start to 
> read. We feel that it would be better to allow the server to be configured to 
> support more than 10 accept's at one time using a configurable parameter. We 
> can still leave 10 as the default.
> 2) Increase the default value of ipc.server.listen.queue.size from 128, or at 
> least document that people with larger clusters starting thousands of mappers 
> at once should increase this value. I wonder if a lot of people running 
> larger clusters are dropping packets and don't realize it because TCP is 
> covering them up. One one hand, yay TCP, on the other hand, those are 
> needless delays and retries because the server can handle more connections.
> 3) Document that ipc.server.listen.queue.size may be limited to the value of 
> SOMAXCONN (linux sysctl net.core.somaxconn ; default 4096 on our systems). 
> The Java docs are not completely clear about this, and it's difficult to test 
> because you can't query the backlog of a listening socket. We were under some 
> time pressure in our case and tried 1024 which was not enough, and 10240 
> which worked, so we stuck with that.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to