Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Sean Busbey Mon, 15 Jun 2015 20:49:39 -0700

thank you for making a more digestible version Allen. :)

If you're interested in soliciting feedback from other projects, I created
ASF short links to this thread in common-dev and hbase:



* http://s.apache.org/yetus-discuss-hadoop
* http://s.apache.org/yetus-discuss-hbase

While I agree that it's important to get feedback from ASF projects that
might find this useful, I can say that recently I've been involved in the
non-ASF project YCSB and both the pretest and better shell stuff would be
immensely useful over there.

On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer <a...@altiscale.com> wrote:

>
>         I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
> test-patch, it was amazing to see how far and wide this bit of code as
> spread.  So I see consolidating everyone's efforts as a huge win for a
> large number of projects.  (esp considering how many I saw suffering from a
> variety of identified bugs! )
>
>         But….
>
>         I think it's important for people involved in those other projects
> to speak up and voice an opinion as to whether this is useful.
>
> To summarize:
>
>         In the short term, a single location to get/use a precommit patch
> tester rather than everyone building/supporting their own in their spare
> time.
>
>          FWIW, we've already got the code base modified to be pluggable.
> We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
> Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
> checks.  Adding support for other project shouldn't be hard.  Simple
> projects take almost no time after seeing the basic pattern.
>
>         I think it's worthwhile highlighting that means support for both
> JIRA and GitHub as well as Ant and Maven from the same code base.
>
> Longer term:
>
>         Well, we clearly have ideas of things that we want to do. Adding
> more features to test-patch (review board? gradle?) is obvious. But what
> about teasing apart and generalizing some of the other shell bits from
> projects? A common library for building CLI tools to fault injection to
> release documentation creation tools to …  I'd even like to see us get as
> advanced as a "run this program to auto-generate daemon stop/start bits".
>
>         I had a few chats with people about this idea at Hadoop Summit.
> What's truly exciting are the ideas that people had once they realized what
> kinds of problems we're trying to solve.  It's always amazing the problems
> that projects have that could be solved by these types of solutions.  Let's
> stop hiding our cool toys in this area.
>
>         So, what feedback and ideas do you have in this area?  Are you a
> yay or a nay?
>
>
> On Jun 15, 2015, at 4:47 PM, Sean Busbey <bus...@cloudera.com> wrote:
>
> > Oof. I had meant to push on this again but life got in the way and now
> the
> > June board meeting is upon us. Sorry everyone. In the event that this
> ends
> > up contentious, hopefully one of the copied communities can give us a
> > branch to work in.
> >
> > I know everyone is busy, so here's the short version of this email: I'd
> > like to move some of the code currently in Hadoop (test-patch) into a new
> > TLP focused on QA tooling. I'm not sure what the best format for priming
> > this conversation is. ORC filled in the incubator project proposal
> > template, but I'm not sure how much that confused the issue. So to start,
> > I'll just write what I'm hoping we can accomplish in general terms here.
> >
> > All software development projects that are community based (that is,
> > accepting outside contributions) face a common QA problem for vetting
> > in-coming contributions. Hadoop is fortunate enough to be sufficiently
> > popular that the weight of the problem drove tool development (i.e.
> > test-patch). That tool is generalizable enough that a bunch of other TLPs
> > have adopted their own forks. Unfortunately, in most projects this kind
> of
> > QA work is an enabler rather than a primary concern, so often the tooling
> > is worked on ad-hoc and little shared improvements happen across
> > projects. Since
> > the tooling itself is never a primary concern, any made is rarely reused
> > outside of ASF projects.
> >
> > Over the last couple months a few of us have been working on generalizing
> > the tooling present in the Hadoop code base (because it was the most
> mature
> > out of all those in the various projects) and it's reached a point where
> we
> > think we can start bringing on other downstream users. This means we need
> > to start establishing things like a release cadence and to grow the new
> > contributors we have to handle more project responsibility. Personally, I
> > think that means it's time to move out from under Hadoop to drive things
> as
> > our own community. Eventually, I hope the community can help draw in a
> > group of folks traditionally underrepresented in ASF projects, namely QA
> > and operations folks.
> >
> > I think test-patch by itself has enough scope to justify a project.
> Having
> > a solid set of build tools that are customizable to fit the norms of
> > different software communities is a bunch of work. Making it work well in
> > both the context of automated test systems like Jenkins and for
> individual
> > developers is even more work. We could easily also take over maintenance
> of
> > things like shelldocs, since test-patch is the primary consumer of that
> > currently but it's generally useful tooling.
> >
> > In addition to test-patch, I think the proposed project has some future
> > growth potential. Given some adoption of test-patch to prove utility, the
> > project could build on the ties it makes to start building tools to help
> > projects do their own longer-run testing. Note that I'm talking about the
> > tools to build QA processes and not a particular set of tested
> components.
> > Specifically, I think the ChaosMonkey work that's in HBase should be
> > generalizable as a fault injection framework (either based on that code
> or
> > something like it). Doing this for arbitrary software is obviously very
> > difficult, and a part of easing that will be to make (and then favor)
> > tooling to allow projects to have operational glue that looks the same.
> > Namely, the shell work that's been done in hadoop-functions.sh would be a
> > great foundational layer that could bring good daemon handling practices
> to
> > a whole slew of software projects. In the event that these frameworks and
> > tools get adopted by parts of the Hadoop ecosystem, that could make the
> job
> > of i.e. Bigtop substantially easier.
> >
> > I've reached out to a few folks who have been involved in the current
> > test-patch work or expressed interest in helping out on getting it used
> in
> > other projects. Right now, the proposed PMC would be (alphabetical by
> last
> > name):
> >
> > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> > pmc, sqoop pmc, all around Jenkins expert)
> > * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> > * Nick Dimiduk (hbase pmc, phoenix pmc)
> > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> > * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> > phoenix pmc)
> > * Allen Wittenauer (hadoop committer)
> >
> > That PMC gives us several members and a bunch of folks familiar with the
> > ASF. Combined with the code already existing in Apache spaces, I think
> that
> > gives us sufficient justification for a direct board proposal.
> >
> > The planned project name is "Apache Yetus". It's an archaic genus of sea
> > snail and most of our project will be focused on shell scripts.
> >
> > N.b.: this does not mean that the Hadoop community would _have_ to rely
> on
> > the new TLP, but I hope that once we have a release that can be evaluated
> > there'd be enough benefit to strongly encourage it.
> >
> > This has mostly been focused on scope and community issues, and I'd love
> to
> > talk through any feedback on that. Additionally, are there any other
> points
> > folks want to make sure are covered before we have a resolution?
> >
> > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com>
> wrote:
> >
> >> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >>
> >>
> >>
> >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com>
> wrote:
> >>
> >>> Hi Folks!
> >>>
> >>> After working on test-patch with other folks for the last few months, I
> >>> think we've reached the point where we can make the fastest progress
> >>> towards the goal of a general use pre-commit patch tester by spinning
> >>> things into a project focused on just that. I think we have a mature
> enough
> >>> code base and a sufficient fledgling community, so I'm going to put
> >>> together a tlp proposal.
> >>>
> >>> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >>> continue to make things more useful.
> >>>
> >>> -Sean
> >>>
> >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com>
> wrote:
> >>>
> >>>> HBase's dev-support folder is where the scripts and support files
> live.
> >>>> We've only recently started adding anything to the maven builds that's
> >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
> where I'd
> >>>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>>
> >>>> There's also our precommit job itself, though it isn't large[2].
> AFAIK,
> >>>> we don't properly back this up anywhere, we just notify each other of
> >>>> changes on a particular mail thread[3].
> >>>>
> >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
> all
> >>>> read because I just finished fixing "mvn site" running out of permgen)
> >>>> [3]: http://s.apache.org/NT0
> >>>>
> >>>>
> >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnaur...@hortonworks.com
> >>>>> wrote:
> >>>>
> >>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>>> HBase
> >>>>> repo?  Is there any additional context we need to be aware of?
> >>>>>
> >>>>> Chris Nauroth
> >>>>> Hortonworks
> >>>>> http://hortonworks.com/
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com> wrote:
> >>>>>
> >>>>>> +dev@hbase
> >>>>>>
> >>>>>> HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>>>> them
> >>>>>> more robust. From what I can tell our stuff started off as an
> earlier
> >>>>>> version of what Hadoop uses for testing.
> >>>>>>
> >>>>>> Folks on either side open to an experiment of combining our
> precommit
> >>>>>> check
> >>>>>> tooling? In principle we should be looking for the same kinds of
> >>>>> things.
> >>>>>>
> >>>>>> Naturally we'll still need different jenkins jobs to handle
> different
> >>>>>> resource needs and we'd need to figure out where stuff eventually
> >>>>> lives,
> >>>>>> but that could come later.
> >>>>>>
> >>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>>> cnaur...@hortonworks.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> The only thing I'm aware of is the failOnError option:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>>>>> rs
> >>>>>>> .html
> >>>>>>>
> >>>>>>>
> >>>>>>> I prefer that we don't disable this, because ignoring different
> >>>>> kinds of
> >>>>>>> failures could leave our build directories in an indeterminate
> state.
> >>>>>>> For
> >>>>>>> example, we could end up with an old class file on the classpath
> for
> >>>>>>> test
> >>>>>>> runs that was supposedly deleted.
> >>>>>>>
> >>>>>>> I think it's worth exploring Eddy's suggestion to try simulating
> >>>>> failure
> >>>>>>> by placing a file where the code expects to see a directory.  That
> >>>>> might
> >>>>>>> even let us enable some of these tests that are skipped on Windows,
> >>>>>>> because Windows allows access for the owner even after permissions
> >>>>> have
> >>>>>>> been stripped.
> >>>>>>>
> >>>>>>> Chris Nauroth
> >>>>>>> Hortonworks
> >>>>>>> http://hortonworks.com/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu>
> wrote:
> >>>>>>>
> >>>>>>>> Is there a maven plugin or setting we can use to simply remove
> >>>>>>>> directories that have no executable permissions on them?  Clearly
> we
> >>>>>>>> have the permission to do this from a technical point of view
> (since
> >>>>>>>> we created the directories as the jenkins user), it's simply that
> >>>>> the
> >>>>>>>> code refuses to do it.
> >>>>>>>>
> >>>>>>>> Otherwise I guess we can just fix those tests...
> >>>>>>>>
> >>>>>>>> Colin
> >>>>>>>>
> >>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote:
> >>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris.
> >>>>>>>>>
> >>>>>>>>> In HDFS-7722:
> >>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >>>>>>>>> TearDown().
> >>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >>>>>>>>>
> >>>>>>>>> Also I ran mvn test several times on my machine and all tests
> >>>>> passed.
> >>>>>>>>>
> >>>>>>>>> However, since in DiskChecker#checkDirAccess():
> >>>>>>>>>
> >>>>>>>>> private static void checkDirAccess(File dir) throws
> >>>>>>> DiskErrorException {
> >>>>>>>>>  if (!dir.isDirectory()) {
> >>>>>>>>>    throw new DiskErrorException("Not a directory: "
> >>>>>>>>>                                 + dir.toString());
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  checkAccessByFileMethods(dir);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> One potentially safer alternative is replacing data dir with a
> >>>>>>> regular
> >>>>>>>>> file to stimulate disk failures.
> >>>>>>>>>
> >>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>>>>>>> <cnaur...@hortonworks.com> wrote:
> >>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>>>>>>>> TestDataNodeVolumeFailureReporting, and
> >>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable
> >>>>>>> permissions
> >>>>>>>>>> from
> >>>>>>>>>> directories like the one Colin mentioned to simulate disk
> >>>>> failures
> >>>>>>> at
> >>>>>>>>>> data
> >>>>>>>>>> nodes.  I reviewed the code for all of those, and they all
> appear
> >>>>>>> to be
> >>>>>>>>>> doing the necessary work to restore executable permissions at
> the
> >>>>>>> end
> >>>>>>>>>> of
> >>>>>>>>>> the test.  The only recent uncommitted patch I¹ve seen that
> makes
> >>>>>>>>>> changes
> >>>>>>>>>> in these test suites is HDFS-7722.  That patch still looks fine
> >>>>>>>>>> though.  I
> >>>>>>>>>> don¹t know if there are other uncommitted patches that changed
> >>>>> these
> >>>>>>>>>> test
> >>>>>>>>>> suites.
> >>>>>>>>>>
> >>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>>>>>> died
> >>>>>>>>>> after removing executable permissions but before restoring them.
> >>>>>>> That
> >>>>>>>>>> always would have been a weakness of these test suites,
> >>>>> regardless
> >>>>>>> of
> >>>>>>>>>> any
> >>>>>>>>>> recent changes.
> >>>>>>>>>>
> >>>>>>>>>> Chris Nauroth
> >>>>>>>>>> Hortonworks
> >>>>>>>>>> http://hortonworks.com/
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hey Colin,
> >>>>>>>>>>>
> >>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going
> >>>>> on
> >>>>>>> with
> >>>>>>>>>>> these boxes. He took a look and concluded that some perms are
> >>>>> being
> >>>>>>>>>>> set in
> >>>>>>>>>>> those directories by our unit tests which are precluding those
> >>>>> files
> >>>>>>>>>>> from
> >>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but
> we
> >>>>>>> should
> >>>>>>>>>>> expect this to keep happening until we can fix the test in
> >>>>> question
> >>>>>>> to
> >>>>>>>>>>> properly clean up after itself.
> >>>>>>>>>>>
> >>>>>>>>>>> To help narrow down which commit it was that started this,
> Andrew
> >>>>>>> sent
> >>>>>>>>>>> me
> >>>>>>>>>>> this info:
> >>>>>>>>>>>
> >>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>>>>>
> >>>>>
> >>>>>>>>>>>
> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>>>>>> /
> >>>>>>>>>>> has
> >>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way
> >>>>> since
> >>>>>>> 9:32
> >>>>>>>>>>> UTC
> >>>>>>>>>>> on March 5th."
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Aaron T. Myers
> >>>>>>>>>>> Software Engineer, Cloudera
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>>>>> <cmcc...@apache.org>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find
> >>>>> any
> >>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours.  Most of
> >>>>> them
> >>>>>>>>>>>> seem
> >>>>>>>>>>>> to be failing with some variant of this message:
> >>>>>>>>>>>>
> >>>>>>>>>>>> [ERROR] Failed to execute goal
> >>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>>>>> (default-clean)
> >>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to
> >>>>> delete
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>>>>>>>>
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>>>>>>>>> fs
> >>>>>>>>>>>> -pr
> >>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>>>>>>> -> [Help 1]
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any ideas how this happened?  Bad disk, unit test setting
> wrong
> >>>>>>>>>>>> permissions?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Colin
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Lei (Eddy) Xu
> >>>>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Sean
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sean
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
>
>


-- 
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Reply via email to