+1 for git clean. Colin, can you please get it in ASAP? Currently due to the jenkins issues, we cannot close the 2.7 blockers.
Thanks, Haohui On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmcc...@apache.org> wrote: > If all it takes is someone creating a test that makes a directory > without -x, this is going to happen over and over. > > Let's just fix the problem at the root by running "git clean -fqdx" in > our jenkins scripts. If there's no objections I will add this in and > un-break the builds. > > best, > Colin > > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> wrote: >> I filed HDFS-7917 to change the way to simulate disk failures. >> >> But I think we still need infrastructure folks to help with jenkins >> scripts to clean the dirs left today. >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com> wrote: >>> Any updates on this issues? It seems that all HDFS jenkins builds are >>> still failing. >>> >>> Regards, >>> Haohui >>> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vinayakum...@apache.org> >>> wrote: >>>> I think the problem started from here. >>>> >>>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/ >>>> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the permission. >>>> But in this patch, ReplicationMonitor got NPE and it got terminate signal, >>>> due to which MiniDFSCluster.shutdown() throwing Exception. >>>> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those permission >>>> after shutting down cluster. So in this case IMO, permissions were never >>>> restored. >>>> >>>> >>>> @After >>>> public void tearDown() throws Exception { >>>> if(data_fail != null) { >>>> FileUtil.setWritable(data_fail, true); >>>> } >>>> if(failedDir != null) { >>>> FileUtil.setWritable(failedDir, true); >>>> } >>>> if(cluster != null) { >>>> cluster.shutdown(); >>>> } >>>> for (int i = 0; i < 3; i++) { >>>> FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true); >>>> FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true); >>>> } >>>> } >>>> >>>> >>>> Regards, >>>> Vinay >>>> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vinayakum...@apache.org> >>>> wrote: >>>> >>>>> When I see the history of these kind of builds, All these are failed on >>>>> node H9. >>>>> >>>>> I think some or the other uncommitted patch would have created the problem >>>>> and left it there. >>>>> >>>>> >>>>> Regards, >>>>> Vinay >>>>> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com> wrote: >>>>> >>>>>> You could rely on a destructive git clean call instead of maven to do the >>>>>> directory removal. >>>>>> >>>>>> -- >>>>>> Sean >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> wrote: >>>>>> >>>>>> > Is there a maven plugin or setting we can use to simply remove >>>>>> > directories that have no executable permissions on them? Clearly we >>>>>> > have the permission to do this from a technical point of view (since >>>>>> > we created the directories as the jenkins user), it's simply that the >>>>>> > code refuses to do it. >>>>>> > >>>>>> > Otherwise I guess we can just fix those tests... >>>>>> > >>>>>> > Colin >>>>>> > >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris. >>>>>> > > >>>>>> > > In HDFS-7722: >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in >>>>>> > TearDown(). >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause. >>>>>> > > >>>>>> > > Also I ran mvn test several times on my machine and all tests passed. >>>>>> > > >>>>>> > > However, since in DiskChecker#checkDirAccess(): >>>>>> > > >>>>>> > > private static void checkDirAccess(File dir) throws >>>>>> DiskErrorException { >>>>>> > > if (!dir.isDirectory()) { >>>>>> > > throw new DiskErrorException("Not a directory: " >>>>>> > > + dir.toString()); >>>>>> > > } >>>>>> > > >>>>>> > > checkAccessByFileMethods(dir); >>>>>> > > } >>>>>> > > >>>>>> > > One potentially safer alternative is replacing data dir with a >>>>>> > > regular >>>>>> > > file to stimulate disk failures. >>>>>> > > >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth < >>>>>> cnaur...@hortonworks.com> >>>>>> > wrote: >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >>>>>> > >> TestDataNodeVolumeFailureReporting, and >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable >>>>>> > >> permissions >>>>>> > from >>>>>> > >> directories like the one Colin mentioned to simulate disk failures >>>>>> > >> at >>>>>> > data >>>>>> > >> nodes. I reviewed the code for all of those, and they all appear to >>>>>> be >>>>>> > >> doing the necessary work to restore executable permissions at the >>>>>> end of >>>>>> > >> the test. The only recent uncommitted patch I¹ve seen that makes >>>>>> > changes >>>>>> > >> in these test suites is HDFS-7722. That patch still looks fine >>>>>> > though. I >>>>>> > >> don¹t know if there are other uncommitted patches that changed these >>>>>> > test >>>>>> > >> suites. >>>>>> > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process unexpectedly >>>>>> > >> died >>>>>> > >> after removing executable permissions but before restoring them. >>>>>> That >>>>>> > >> always would have been a weakness of these test suites, regardless >>>>>> > >> of >>>>>> > any >>>>>> > >> recent changes. >>>>>> > >> >>>>>> > >> Chris Nauroth >>>>>> > >> Hortonworks >>>>>> > >> http://hortonworks.com/ >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote: >>>>>> > >> >>>>>> > >>>Hey Colin, >>>>>> > >>> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on >>>>>> with >>>>>> > >>>these boxes. He took a look and concluded that some perms are being >>>>>> set >>>>>> > in >>>>>> > >>>those directories by our unit tests which are precluding those files >>>>>> > from >>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but we >>>>>> should >>>>>> > >>>expect this to keep happening until we can fix the test in question >>>>>> to >>>>>> > >>>properly clean up after itself. >>>>>> > >>> >>>>>> > >>>To help narrow down which commit it was that started this, Andrew >>>>>> sent >>>>>> > me >>>>>> > >>>this info: >>>>>> > >>> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >>>>>> > >>>>>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ >>>>>> > has >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way since >>>>>> 9:32 >>>>>> > >>>UTC >>>>>> > >>>on March 5th." >>>>>> > >>> >>>>>> > >>>-- >>>>>> > >>>Aaron T. Myers >>>>>> > >>>Software Engineer, Cloudera >>>>>> > >>> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmcc...@apache.org >>>>>> > >>>>>> > >>>wrote: >>>>>> > >>> >>>>>> > >>>> Hi all, >>>>>> > >>>> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't find any >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours. Most of them >>>>>> seem >>>>>> > >>>> to be failing with some variant of this message: >>>>>> > >>>> >>>>>> > >>>> [ERROR] Failed to execute goal >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >>>>>> (default-clean) >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete >>>>>> > >>>> >>>>>> > >>>> >>>>>> > >>>>>> > >>>>>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>>> > >>>> -> [Help 1] >>>>>> > >>>> >>>>>> > >>>> Any ideas how this happened? Bad disk, unit test setting wrong >>>>>> > >>>> permissions? >>>>>> > >>>> >>>>>> > >>>> Colin >>>>>> > >>>> >>>>>> > >> >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > -- >>>>>> > > Lei (Eddy) Xu >>>>>> > > Software Engineer, Cloudera >>>>>> > >>>>>> >>>>> >>>>> >> >> >> >> -- >> Lei (Eddy) Xu >> Software Engineer, Cloudera