Is the simulation just removing the executable bit on the directory? I'd
like to get something I can reproduce locally.

On Tue, Mar 17, 2015 at 2:29 AM, Vinayakumar B <vinayakum...@apache.org>
wrote:

> I have simulated the problem in my env and verified that, both 'git clean
> -xdf' and 'mvn clean' will not remove the directory.
> mvn fails where as git simply ignores (not even display any warning) the
> problem.
>
>
>
> Regards,
> Vinay
>
> On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bus...@cloudera.com> wrote:
>
> > Can someone point me to an example build that is broken?
> >
> > On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bus...@cloudera.com>
> wrote:
> >
> > > I'm on it. HADOOP-11721
> > >
> > > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <whe...@apache.org> wrote:
> > >
> > >> +1 for git clean.
> > >>
> > >> Colin, can you please get it in ASAP? Currently due to the jenkins
> > >> issues, we cannot close the 2.7 blockers.
> > >>
> > >> Thanks,
> > >> Haohui
> > >>
> > >>
> > >>
> > >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmcc...@apache.org
> >
> > >> wrote:
> > >> > If all it takes is someone creating a test that makes a directory
> > >> > without -x, this is going to happen over and over.
> > >> >
> > >> > Let's just fix the problem at the root by running "git clean -fqdx"
> in
> > >> > our jenkins scripts.  If there's no objections I will add this in
> and
> > >> > un-break the builds.
> > >> >
> > >> > best,
> > >> > Colin
> > >> >
> > >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> wrote:
> > >> >> I filed HDFS-7917 to change the way to simulate disk failures.
> > >> >>
> > >> >> But I think we still need infrastructure folks to help with jenkins
> > >> >> scripts to clean the dirs left today.
> > >> >>
> > >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com>
> > >> wrote:
> > >> >>> Any updates on this issues? It seems that all HDFS jenkins builds
> > are
> > >> >>> still failing.
> > >> >>>
> > >> >>> Regards,
> > >> >>> Haohui
> > >> >>>
> > >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
> > >> vinayakum...@apache.org> wrote:
> > >> >>>> I think the problem started from here.
> > >> >>>>
> > >> >>>>
> > >>
> >
> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
> > >> >>>>
> > >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
> > >> permission.
> > >> >>>> But in this patch, ReplicationMonitor got NPE and it got
> terminate
> > >> signal,
> > >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
> > >> >>>>
> > >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
> > >> permission
> > >> >>>> after shutting down cluster. So in this case IMO, permissions
> were
> > >> never
> > >> >>>> restored.
> > >> >>>>
> > >> >>>>
> > >> >>>>   @After
> > >> >>>>   public void tearDown() throws Exception {
> > >> >>>>     if(data_fail != null) {
> > >> >>>>       FileUtil.setWritable(data_fail, true);
> > >> >>>>     }
> > >> >>>>     if(failedDir != null) {
> > >> >>>>       FileUtil.setWritable(failedDir, true);
> > >> >>>>     }
> > >> >>>>     if(cluster != null) {
> > >> >>>>       cluster.shutdown();
> > >> >>>>     }
> > >> >>>>     for (int i = 0; i < 3; i++) {
> > >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
> > >> true);
> > >> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
> > >> true);
> > >> >>>>     }
> > >> >>>>   }
> > >> >>>>
> > >> >>>>
> > >> >>>> Regards,
> > >> >>>> Vinay
> > >> >>>>
> > >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
> > >> vinayakum...@apache.org>
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>> When I see the history of these kind of builds, All these are
> > >> failed on
> > >> >>>>> node H9.
> > >> >>>>>
> > >> >>>>> I think some or the other uncommitted patch would have created
> the
> > >> problem
> > >> >>>>> and left it there.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Regards,
> > >> >>>>> Vinay
> > >> >>>>>
> > >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <
> bus...@cloudera.com
> > >
> > >> wrote:
> > >> >>>>>
> > >> >>>>>> You could rely on a destructive git clean call instead of maven
> > to
> > >> do the
> > >> >>>>>> directory removal.
> > >> >>>>>>
> > >> >>>>>> --
> > >> >>>>>> Sean
> > >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <
> cmcc...@alumni.cmu.edu>
> > >> wrote:
> > >> >>>>>>
> > >> >>>>>> > Is there a maven plugin or setting we can use to simply
> remove
> > >> >>>>>> > directories that have no executable permissions on them?
> > >> Clearly we
> > >> >>>>>> > have the permission to do this from a technical point of view
> > >> (since
> > >> >>>>>> > we created the directories as the jenkins user), it's simply
> > >> that the
> > >> >>>>>> > code refuses to do it.
> > >> >>>>>> >
> > >> >>>>>> > Otherwise I guess we can just fix those tests...
> > >> >>>>>> >
> > >> >>>>>> > Colin
> > >> >>>>>> >
> > >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com>
> > >> wrote:
> > >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
> > >> >>>>>> > >
> > >> >>>>>> > > In HDFS-7722:
> > >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir
> permissions
> > >> in
> > >> >>>>>> > TearDown().
> > >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
> > >> clause.
> > >> >>>>>> > >
> > >> >>>>>> > > Also I ran mvn test several times on my machine and all
> tests
> > >> passed.
> > >> >>>>>> > >
> > >> >>>>>> > > However, since in DiskChecker#checkDirAccess():
> > >> >>>>>> > >
> > >> >>>>>> > > private static void checkDirAccess(File dir) throws
> > >> >>>>>> DiskErrorException {
> > >> >>>>>> > >   if (!dir.isDirectory()) {
> > >> >>>>>> > >     throw new DiskErrorException("Not a directory: "
> > >> >>>>>> > >                                  + dir.toString());
> > >> >>>>>> > >   }
> > >> >>>>>> > >
> > >> >>>>>> > >   checkAccessByFileMethods(dir);
> > >> >>>>>> > > }
> > >> >>>>>> > >
> > >> >>>>>> > > One potentially safer alternative is replacing data dir
> with
> > a
> > >> regular
> > >> >>>>>> > > file to stimulate disk failures.
> > >> >>>>>> > >
> > >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
> > >> >>>>>> cnaur...@hortonworks.com>
> > >> >>>>>> > wrote:
> > >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> > >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
> > >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
> > >> permissions
> > >> >>>>>> > from
> > >> >>>>>> > >> directories like the one Colin mentioned to simulate disk
> > >> failures at
> > >> >>>>>> > data
> > >> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
> > >> appear to
> > >> >>>>>> be
> > >> >>>>>> > >> doing the necessary work to restore executable permissions
> > at
> > >> the
> > >> >>>>>> end of
> > >> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen
> that
> > >> makes
> > >> >>>>>> > changes
> > >> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks
> > >> fine
> > >> >>>>>> > though.  I
> > >> >>>>>> > >> don¹t know if there are other uncommitted patches that
> > >> changed these
> > >> >>>>>> > test
> > >> >>>>>> > >> suites.
> > >> >>>>>> > >>
> > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process
> > >> unexpectedly died
> > >> >>>>>> > >> after removing executable permissions but before restoring
> > >> them.
> > >> >>>>>> That
> > >> >>>>>> > >> always would have been a weakness of these test suites,
> > >> regardless of
> > >> >>>>>> > any
> > >> >>>>>> > >> recent changes.
> > >> >>>>>> > >>
> > >> >>>>>> > >> Chris Nauroth
> > >> >>>>>> > >> Hortonworks
> > >> >>>>>> > >> http://hortonworks.com/
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >>
> > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com>
> > >> wrote:
> > >> >>>>>> > >>
> > >> >>>>>> > >>>Hey Colin,
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
> > >> going on
> > >> >>>>>> with
> > >> >>>>>> > >>>these boxes. He took a look and concluded that some perms
> > are
> > >> being
> > >> >>>>>> set
> > >> >>>>>> > in
> > >> >>>>>> > >>>those directories by our unit tests which are precluding
> > >> those files
> > >> >>>>>> > from
> > >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us,
> > but
> > >> we
> > >> >>>>>> should
> > >> >>>>>> > >>>expect this to keep happening until we can fix the test in
> > >> question
> > >> >>>>>> to
> > >> >>>>>> > >>>properly clean up after itself.
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>To help narrow down which commit it was that started this,
> > >> Andrew
> > >> >>>>>> sent
> > >> >>>>>> > me
> > >> >>>>>> > >>>this info:
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> > >> >>>>>> >
> > >> >>>>>>
> > >>
> > >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
> > >> >>>>>> > has
> > >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that
> way
> > >> since
> > >> >>>>>> 9:32
> > >> >>>>>> > >>>UTC
> > >> >>>>>> > >>>on March 5th."
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>--
> > >> >>>>>> > >>>Aaron T. Myers
> > >> >>>>>> > >>>Software Engineer, Cloudera
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
> > >> cmcc...@apache.org
> > >> >>>>>> >
> > >> >>>>>> > >>>wrote:
> > >> >>>>>> > >>>
> > >> >>>>>> > >>>> Hi all,
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I
> can't
> > >> find any
> > >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.
> Most
> > >> of them
> > >> >>>>>> seem
> > >> >>>>>> > >>>> to be failing with some variant of this message:
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> [ERROR] Failed to execute goal
> > >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> > >> >>>>>> (default-clean)
> > >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed
> to
> > >> delete
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>>
> > >> >>>>>> >
> > >> >>>>>> >
> > >> >>>>>>
> > >>
> >
> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
> > >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> > >> >>>>>> > >>>> -> [Help 1]
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test
> setting
> > >> wrong
> > >> >>>>>> > >>>> permissions?
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>>> Colin
> > >> >>>>>> > >>>>
> > >> >>>>>> > >>
> > >> >>>>>> > >
> > >> >>>>>> > >
> > >> >>>>>> > >
> > >> >>>>>> > > --
> > >> >>>>>> > > Lei (Eddy) Xu
> > >> >>>>>> > > Software Engineer, Cloudera
> > >> >>>>>> >
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Lei (Eddy) Xu
> > >> >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Sean
> > >
> >
> >
> >
> > --
> > Sean
> >
>



-- 
Sean

Reply via email to