My plan is currently to: * switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 patch to test it out. * if the tests work, work on getting YETUS-561 committed to yetus master * switch jobs back to ASF yetus master either post-YETUS-561 or without it if it doesn’t work * go back to working on something else, regardless of the outcome
> On Oct 24, 2017, at 2:55 PM, Chris Douglas <[email protected]> wrote: > > Sean/Junping- > > Ignoring the epistemology, it's a problem. Let's figure out what's > causing memory to balloon and then we can work out the appropriate > remedy. > > Is this reproducible outside the CI environment? To Junping's point, > would YETUS-561 provide more detailed information to aid debugging? -C > > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <[email protected]> wrote: >> In general, the "solid evidence" of memory leak comes from analysis of >> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which >> piece of code are leaking memory from the analysis. >> >> Unfortunately, I cannot find any conclusion from previous comments and it >> even cannot tell which daemons/components of HDFS consumes unexpected high >> memory. Don't sounds like a solid bug report to me. >> >> >> >> Thanks,? >> >> >> Junping >> >> >> ________________________________ >> From: Sean Busbey <[email protected]> >> Sent: Tuesday, October 24, 2017 2:20 PM >> To: Junping Du >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; >> [email protected]; [email protected] >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 >> >> Just curious, Junping what would "solid evidence" look like? Is the >> supposition here that the memory leak is within HDFS test code rather than >> library runtime code? How would such a distinction be shown? >> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du >> <[email protected]<mailto:[email protected]>> wrote: >> Allen, >> Do we have any solid evidence to show the HDFS unit tests going through >> the roof are due to serious memory leak by HDFS? Normally, I don't expect >> memory leak are identified in our UTs - mostly, it (test jvm gone) is just >> because of test or deployment issues. >> Unless there is concrete evidence, my concern on seriously memory leak >> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, >> etc.) have deployed 2.8 on large production environment for months. >> Non-serious memory leak (like forgetting to close stream in non-critical >> path, etc.) and other non-critical bugs always happens here and there that >> we have to live with. >> >> Thanks, >> >> Junping >> >> ________________________________________ >> From: Allen Wittenauer >> <[email protected]<mailto:[email protected]>> >> Sent: Tuesday, October 24, 2017 8:27 AM >> To: Hadoop Common >> Cc: Hdfs-dev; >> [email protected]<mailto:[email protected]>; >> [email protected]<mailto:[email protected]> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 >> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer >>> <[email protected]<mailto:[email protected]>> wrote: >>> >>> >>> >>> With no other information or access to go on, my current hunch is that one >>> of the HDFS unit tests is ballooning in memory size. The easiest way to >>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and >>> that's what this "feels" like. >>> >>> Someone should verify if 2.8.2 has the same issues before a release goes >>> out ... >> >> >> FWIW, I ran 2.8.2 last night and it has the same problems. >> >> Also: the node didn't die! Looking through the workspace (so the >> next run will destroy them), two sets of logs stand out: >> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt >> >> and >> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ >> >> It looks like my hunch is correct: RAM in the HDFS unit tests are >> going through the roof. It's also interesting how MANY log files there are. >> Is surefire not picking up that jobs are dying? Maybe not if memory is >> getting tight. >> >> Anyway, at the point, branch-2.8 and higher are probably fubar'd. >> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker >> containers can have their RAM limits set in order to prevent more nodes >> going catatonic. >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> [email protected]<mailto:[email protected]> >> For additional commands, e-mail: >> [email protected]<mailto:[email protected]> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> [email protected]<mailto:[email protected]> >> For additional commands, e-mail: >> [email protected]<mailto:[email protected]> >> >> >> >> >> -- >> busbey > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
