[ https://issues.apache.org/jira/browse/HADOOP-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653976#action_12653976 ]
Brian Bockelman commented on HADOOP-4775: ----------------------------------------- Hey Pete, I'll have our sysadmins try out the 4616 and 4635 patches There were no messages in syslog, meaning it probably didn't segfault (is this correct?) Here's what the failure looks like: http://jobrobot.web.cern.ch/JobRobot/errors_081205.html#T2_US_Nebraska http://jobrobot.web.cern.ch/JobRobot/errors_081204.html#T2_US_Nebraska I've got a hard time believing that a memory leak alone could disconnect the FUSE endpoint... 1/3 of the workers are 4GB, 1/3 are 8GB, 1/3 are 16GB. It would take quite a bit of effort to get a memory leak to cause the problems on the 16GB nodes. Plus, I didn't see OOM killing anything in dmesg. I set up a debug FUSE instance on a node and hit it with a similar workflow. No problems at all; it may be that, in debug mode, FUSE doesn't allow multiple threads? My suspicion is that either FUSE-DFS or libhdfs has a problem with error recovery which causes an infinite loop (like we've seen in other places). The interesting thing for the "ps" output I showed above is that the fuse_dfs process was using 30% CPU *when nothing was using FUSE* and the node wasn't swapping. Nagios now restarts FUSE-DFS whenever the problem occurs, so I don't get much of a chance to debug. Still, about 7% of our jobs die because FUSE conks out mid-job. > FUSE crashes reliably on 0.19.0 > ------------------------------- > > Key: HADOOP-4775 > URL: https://issues.apache.org/jira/browse/HADOOP-4775 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/fuse-dfs > Reporter: Brian Bockelman > Priority: Critical > > Every morning I come in and find many nodes which have developed the dreaded > "Transport endpoint not connected" error overnight. This has only started > after the 0.19.0 upgrade. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.