Can someone point me to an example build that is broken?

On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bus...@cloudera.com> wrote:

> I'm on it. HADOOP-11721
>
> On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <whe...@apache.org> wrote:
>
>> +1 for git clean.
>>
>> Colin, can you please get it in ASAP? Currently due to the jenkins
>> issues, we cannot close the 2.7 blockers.
>>
>> Thanks,
>> Haohui
>>
>>
>>
>> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmcc...@apache.org>
>> wrote:
>> > If all it takes is someone creating a test that makes a directory
>> > without -x, this is going to happen over and over.
>> >
>> > Let's just fix the problem at the root by running "git clean -fqdx" in
>> > our jenkins scripts.  If there's no objections I will add this in and
>> > un-break the builds.
>> >
>> > best,
>> > Colin
>> >
>> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> wrote:
>> >> I filed HDFS-7917 to change the way to simulate disk failures.
>> >>
>> >> But I think we still need infrastructure folks to help with jenkins
>> >> scripts to clean the dirs left today.
>> >>
>> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com>
>> wrote:
>> >>> Any updates on this issues? It seems that all HDFS jenkins builds are
>> >>> still failing.
>> >>>
>> >>> Regards,
>> >>> Haohui
>> >>>
>> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <
>> vinayakum...@apache.org> wrote:
>> >>>> I think the problem started from here.
>> >>>>
>> >>>>
>> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
>> >>>>
>> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
>> permission.
>> >>>> But in this patch, ReplicationMonitor got NPE and it got terminate
>> signal,
>> >>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>> >>>>
>> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
>> permission
>> >>>> after shutting down cluster. So in this case IMO, permissions were
>> never
>> >>>> restored.
>> >>>>
>> >>>>
>> >>>>   @After
>> >>>>   public void tearDown() throws Exception {
>> >>>>     if(data_fail != null) {
>> >>>>       FileUtil.setWritable(data_fail, true);
>> >>>>     }
>> >>>>     if(failedDir != null) {
>> >>>>       FileUtil.setWritable(failedDir, true);
>> >>>>     }
>> >>>>     if(cluster != null) {
>> >>>>       cluster.shutdown();
>> >>>>     }
>> >>>>     for (int i = 0; i < 3; i++) {
>> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)),
>> true);
>> >>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)),
>> true);
>> >>>>     }
>> >>>>   }
>> >>>>
>> >>>>
>> >>>> Regards,
>> >>>> Vinay
>> >>>>
>> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <
>> vinayakum...@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> When I see the history of these kind of builds, All these are
>> failed on
>> >>>>> node H9.
>> >>>>>
>> >>>>> I think some or the other uncommitted patch would have created the
>> problem
>> >>>>> and left it there.
>> >>>>>
>> >>>>>
>> >>>>> Regards,
>> >>>>> Vinay
>> >>>>>
>> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com>
>> wrote:
>> >>>>>
>> >>>>>> You could rely on a destructive git clean call instead of maven to
>> do the
>> >>>>>> directory removal.
>> >>>>>>
>> >>>>>> --
>> >>>>>> Sean
>> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu>
>> wrote:
>> >>>>>>
>> >>>>>> > Is there a maven plugin or setting we can use to simply remove
>> >>>>>> > directories that have no executable permissions on them?
>> Clearly we
>> >>>>>> > have the permission to do this from a technical point of view
>> (since
>> >>>>>> > we created the directories as the jenkins user), it's simply
>> that the
>> >>>>>> > code refuses to do it.
>> >>>>>> >
>> >>>>>> > Otherwise I guess we can just fix those tests...
>> >>>>>> >
>> >>>>>> > Colin
>> >>>>>> >
>> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com>
>> wrote:
>> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>> >>>>>> > >
>> >>>>>> > > In HDFS-7722:
>> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions
>> in
>> >>>>>> > TearDown().
>> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
>> clause.
>> >>>>>> > >
>> >>>>>> > > Also I ran mvn test several times on my machine and all tests
>> passed.
>> >>>>>> > >
>> >>>>>> > > However, since in DiskChecker#checkDirAccess():
>> >>>>>> > >
>> >>>>>> > > private static void checkDirAccess(File dir) throws
>> >>>>>> DiskErrorException {
>> >>>>>> > >   if (!dir.isDirectory()) {
>> >>>>>> > >     throw new DiskErrorException("Not a directory: "
>> >>>>>> > >                                  + dir.toString());
>> >>>>>> > >   }
>> >>>>>> > >
>> >>>>>> > >   checkAccessByFileMethods(dir);
>> >>>>>> > > }
>> >>>>>> > >
>> >>>>>> > > One potentially safer alternative is replacing data dir with a
>> regular
>> >>>>>> > > file to stimulate disk failures.
>> >>>>>> > >
>> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>> >>>>>> cnaur...@hortonworks.com>
>> >>>>>> > wrote:
>> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >>>>>> > >> TestDataNodeVolumeFailureReporting, and
>> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
>> permissions
>> >>>>>> > from
>> >>>>>> > >> directories like the one Colin mentioned to simulate disk
>> failures at
>> >>>>>> > data
>> >>>>>> > >> nodes.  I reviewed the code for all of those, and they all
>> appear to
>> >>>>>> be
>> >>>>>> > >> doing the necessary work to restore executable permissions at
>> the
>> >>>>>> end of
>> >>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
>> makes
>> >>>>>> > changes
>> >>>>>> > >> in these test suites is HDFS-7722.  That patch still looks
>> fine
>> >>>>>> > though.  I
>> >>>>>> > >> don¹t know if there are other uncommitted patches that
>> changed these
>> >>>>>> > test
>> >>>>>> > >> suites.
>> >>>>>> > >>
>> >>>>>> > >> I suppose it¹s also possible that the JUnit process
>> unexpectedly died
>> >>>>>> > >> after removing executable permissions but before restoring
>> them.
>> >>>>>> That
>> >>>>>> > >> always would have been a weakness of these test suites,
>> regardless of
>> >>>>>> > any
>> >>>>>> > >> recent changes.
>> >>>>>> > >>
>> >>>>>> > >> Chris Nauroth
>> >>>>>> > >> Hortonworks
>> >>>>>> > >> http://hortonworks.com/
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >>
>> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com>
>> wrote:
>> >>>>>> > >>
>> >>>>>> > >>>Hey Colin,
>> >>>>>> > >>>
>> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
>> going on
>> >>>>>> with
>> >>>>>> > >>>these boxes. He took a look and concluded that some perms are
>> being
>> >>>>>> set
>> >>>>>> > in
>> >>>>>> > >>>those directories by our unit tests which are precluding
>> those files
>> >>>>>> > from
>> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but
>> we
>> >>>>>> should
>> >>>>>> > >>>expect this to keep happening until we can fix the test in
>> question
>> >>>>>> to
>> >>>>>> > >>>properly clean up after itself.
>> >>>>>> > >>>
>> >>>>>> > >>>To help narrow down which commit it was that started this,
>> Andrew
>> >>>>>> sent
>> >>>>>> > me
>> >>>>>> > >>>this info:
>> >>>>>> > >>>
>> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>>>>> >
>> >>>>>>
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> >>>>>> > has
>> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
>> since
>> >>>>>> 9:32
>> >>>>>> > >>>UTC
>> >>>>>> > >>>on March 5th."
>> >>>>>> > >>>
>> >>>>>> > >>>--
>> >>>>>> > >>>Aaron T. Myers
>> >>>>>> > >>>Software Engineer, Cloudera
>> >>>>>> > >>>
>> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <
>> cmcc...@apache.org
>> >>>>>> >
>> >>>>>> > >>>wrote:
>> >>>>>> > >>>
>> >>>>>> > >>>> Hi all,
>> >>>>>> > >>>>
>> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
>> find any
>> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most
>> of them
>> >>>>>> seem
>> >>>>>> > >>>> to be failing with some variant of this message:
>> >>>>>> > >>>>
>> >>>>>> > >>>> [ERROR] Failed to execute goal
>> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>>>>> (default-clean)
>> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to
>> delete
>> >>>>>> > >>>>
>> >>>>>> > >>>>
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>> > >>>> -> [Help 1]
>> >>>>>> > >>>>
>> >>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
>> wrong
>> >>>>>> > >>>> permissions?
>> >>>>>> > >>>>
>> >>>>>> > >>>> Colin
>> >>>>>> > >>>>
>> >>>>>> > >>
>> >>>>>> > >
>> >>>>>> > >
>> >>>>>> > >
>> >>>>>> > > --
>> >>>>>> > > Lei (Eddy) Xu
>> >>>>>> > > Software Engineer, Cloudera
>> >>>>>> >
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lei (Eddy) Xu
>> >> Software Engineer, Cloudera
>>
>
>
>
> --
> Sean
>



-- 
Sean

Reply via email to