[
https://issues.apache.org/jira/browse/MAPREDUCE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177612#comment-13177612
]
Evert Lammerts commented on MAPREDUCE-2980:
-------------------------------------------
Is it possible that this causes SocketTimeOut exceptions in HDFS as well? I'm
getting excluded datanodes when copying large single files ( > 1TB). As far as
I can see it's not due to xceivers or any physical bound (RAM / core / IO loads
are fine), and I don't see anything in the NN / DN logs. I've worked around it
by increasing dfs.socket.timeout to 10 minutes, but the network patterns I see
in Ganglia are worrying - every 10 to 30 minutes a complete drop of activity
for some minutes. It might of course be a problem with our switches or our
bonded interfaces as well...
> Fetch failures and other related issues in Jetty 6.1.26
> -------------------------------------------------------
>
> Key: MAPREDUCE-2980
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2980
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.20.205.0, 0.23.0
> Reporter: Todd Lipcon
> Priority: Critical
>
> Since upgrading Jetty from 6.1.14 to 6.1.26 we've had a ton of HTTP-related
> issues, including:
> - Much higher incidence of fetch failures
> - A few strange file-descriptor related bugs (eg MAPREDUCE-2389)
> - A few unexplained issues where long "fsck"s on the NameNode drop out
> halfway through with a ClosedChannelException
> Stress tests with 10000Map x 10000Reduce sleep jobs reliably reproduce fetch
> failures at a rate of about 1 per million on a 25 node test cluster. These
> problems are all new since the upgrade from 6.1.14.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira