[
https://issues.apache.org/jira/browse/HADOOP-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HADOOP-6308:
--------------------------------
Status: Open (was: Patch Available)
patch appears to be out of date
> make number of IPC accepts configurable
> ---------------------------------------
>
> Key: HADOOP-6308
> URL: https://issues.apache.org/jira/browse/HADOOP-6308
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Affects Versions: 0.20.0
> Environment: Linux, running Yahoo-based 0.20
> Reporter: Andrew Ryan
> Attachments: HADOOP-6308.patch
>
>
> We were recently seeing issues in our environments where HDFS clients would
> experience RST's from the NN when trying to do RPC to get file info, which
> would cause the task to fatal out. After some debugging we identified this to
> be that the IPC server listen queue -- ipc.server.listen.queue.size -- was
> far too low, we had been using the default value of 128 and found we needed
> to bump it up to 10240 before resets went away (although this value is a bit
> suspect, as I will explain later in the issue).
> When a large map job starts, lots of clients very quickly start to issue RPC
> requests to the namenode, which creates this listen queue filling up problem,
> because clients are opening connections faster than Hadoop's RPC server can
> process them. We went back to our 0.17 cluster and instrumented that with
> tcpdump and found that we had been sending RST's for a long time there, but
> the retry handling was implemented differently back in 0.17 so a single TCP
> failure wasn't task-fatal.
> In our environment we have our TCP stack set to explicitly send resets when
> the listen queue gets overflowed (syctl net.ipv4.tcp_abort_on_overflow = 1),
> default linux behavior is to start dropping SYN packets and let the client
> retransmit. Other people may be experiencing this issue and not noticing it
> because they are using the default behavior, which is to let the NN drop
> packets on the floor and let clients retransmit.
> So we've identified (at least) 3 improvements that can be made here:
> 1) In src/core/org/apache/hadoop/ipc/Server.java, Listener.doAccept() is
> currently hardcoded to do 10 accept()'s at a time, then it will start to
> read. We feel that it would be better to allow the server to be configured to
> support more than 10 accept's at one time using a configurable parameter. We
> can still leave 10 as the default.
> 2) Increase the default value of ipc.server.listen.queue.size from 128, or at
> least document that people with larger clusters starting thousands of mappers
> at once should increase this value. I wonder if a lot of people running
> larger clusters are dropping packets and don't realize it because TCP is
> covering them up. One one hand, yay TCP, on the other hand, those are
> needless delays and retries because the server can handle more connections.
> 3) Document that ipc.server.listen.queue.size may be limited to the value of
> SOMAXCONN (linux sysctl net.core.somaxconn ; default 4096 on our systems).
> The Java docs are not completely clear about this, and it's difficult to test
> because you can't query the backlog of a listening socket. We were under some
> time pressure in our case and tried 1024 which was not enough, and 10240
> which worked, so we stuck with that.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira