As mentioned on HADOOP-12111, there is now an incubator-style proposal: http://wiki.apache.org/incubator/YetusProposal
On Wed, Jun 24, 2015 at 9:41 AM, Sean Busbey <bus...@cloudera.com> wrote: > Hi Folks! > > Work in a feature branch is now being tracked by HADOOP-12111. > > On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <bus...@cloudera.com> wrote: > >> It looks like we have consensus. >> >> I'll start drafting up a proposal for the next board meeting (July 15th). >> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track >> that we did due diligence on whatever we pick. >> >> In the mean time, Hadoop PMC would y'all be willing to host us in a >> branch so that we can start prepping things now? We would want branch >> commit rights for the proposed new PMC. >> >> >> -Sean >> >> >> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <bus...@cloudera.com> wrote: >> >>> Oof. I had meant to push on this again but life got in the way and now >>> the June board meeting is upon us. Sorry everyone. In the event that this >>> ends up contentious, hopefully one of the copied communities can give us a >>> branch to work in. >>> >>> I know everyone is busy, so here's the short version of this email: I'd >>> like to move some of the code currently in Hadoop (test-patch) into a new >>> TLP focused on QA tooling. I'm not sure what the best format for priming >>> this conversation is. ORC filled in the incubator project proposal >>> template, but I'm not sure how much that confused the issue. So to start, >>> I'll just write what I'm hoping we can accomplish in general terms here. >>> >>> All software development projects that are community based (that is, >>> accepting outside contributions) face a common QA problem for vetting >>> in-coming contributions. Hadoop is fortunate enough to be sufficiently >>> popular that the weight of the problem drove tool development (i.e. >>> test-patch). That tool is generalizable enough that a bunch of other TLPs >>> have adopted their own forks. Unfortunately, in most projects this kind of >>> QA work is an enabler rather than a primary concern, so often the tooling >>> is worked on ad-hoc and little shared improvements happen across projects. >>> Since >>> the tooling itself is never a primary concern, any made is rarely reused >>> outside of ASF projects. >>> >>> Over the last couple months a few of us have been working on >>> generalizing the tooling present in the Hadoop code base (because it was >>> the most mature out of all those in the various projects) and it's reached >>> a point where we think we can start bringing on other downstream users. >>> This means we need to start establishing things like a release cadence and >>> to grow the new contributors we have to handle more project responsibility. >>> Personally, I think that means it's time to move out from under Hadoop to >>> drive things as our own community. Eventually, I hope the community can >>> help draw in a group of folks traditionally underrepresented in ASF >>> projects, namely QA and operations folks. >>> >>> I think test-patch by itself has enough scope to justify a project. >>> Having a solid set of build tools that are customizable to fit the norms of >>> different software communities is a bunch of work. Making it work well in >>> both the context of automated test systems like Jenkins and for individual >>> developers is even more work. We could easily also take over maintenance of >>> things like shelldocs, since test-patch is the primary consumer of that >>> currently but it's generally useful tooling. >>> >>> In addition to test-patch, I think the proposed project has some future >>> growth potential. Given some adoption of test-patch to prove utility, the >>> project could build on the ties it makes to start building tools to help >>> projects do their own longer-run testing. Note that I'm talking about the >>> tools to build QA processes and not a particular set of tested components. >>> Specifically, I think the ChaosMonkey work that's in HBase should be >>> generalizable as a fault injection framework (either based on that code or >>> something like it). Doing this for arbitrary software is obviously very >>> difficult, and a part of easing that will be to make (and then favor) >>> tooling to allow projects to have operational glue that looks the same. >>> Namely, the shell work that's been done in hadoop-functions.sh would be a >>> great foundational layer that could bring good daemon handling practices to >>> a whole slew of software projects. In the event that these frameworks and >>> tools get adopted by parts of the Hadoop ecosystem, that could make the job >>> of i.e. Bigtop substantially easier. >>> >>> I've reached out to a few folks who have been involved in the current >>> test-patch work or expressed interest in helping out on getting it used in >>> other projects. Right now, the proposed PMC would be (alphabetical by last >>> name): >>> >>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, >>> jclouds pmc, sqoop pmc, all around Jenkins expert) >>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc) >>> * Nick Dimiduk (hbase pmc, phoenix pmc) >>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) >>> * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, >>> phoenix pmc) >>> * Allen Wittenauer (hadoop committer) >>> >>> That PMC gives us several members and a bunch of folks familiar with the >>> ASF. Combined with the code already existing in Apache spaces, I think that >>> gives us sufficient justification for a direct board proposal. >>> >>> The planned project name is "Apache Yetus". It's an archaic genus of sea >>> snail and most of our project will be focused on shell scripts. >>> >>> N.b.: this does not mean that the Hadoop community would _have_ to rely >>> on the new TLP, but I hope that once we have a release that can be >>> evaluated there'd be enough benefit to strongly encourage it. >>> >>> This has mostly been focused on scope and community issues, and I'd love >>> to talk through any feedback on that. Additionally, are there any other >>> points folks want to make sure are covered before we have a resolution? >>> >>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com> >>> wrote: >>> >>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag. >>>> >>>> >>>> >>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com> >>>> wrote: >>>> >>>>> Hi Folks! >>>>> >>>>> After working on test-patch with other folks for the last few months, >>>>> I think we've reached the point where we can make the fastest progress >>>>> towards the goal of a general use pre-commit patch tester by spinning >>>>> things into a project focused on just that. I think we have a mature >>>>> enough >>>>> code base and a sufficient fledgling community, so I'm going to put >>>>> together a tlp proposal. >>>>> >>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can >>>>> continue to make things more useful. >>>>> >>>>> -Sean >>>>> >>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com> >>>>> wrote: >>>>> >>>>>> HBase's dev-support folder is where the scripts and support files >>>>>> live. We've only recently started adding anything to the maven builds >>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's >>>>>> where I'd add in more if we ran into the same permissions problems y'all >>>>>> are having. >>>>>> >>>>>> There's also our precommit job itself, though it isn't large[2]. >>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other >>>>>> of changes on a particular mail thread[3]. >>>>>> >>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687 >>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're >>>>>> all read because I just finished fixing "mvn site" running out of >>>>>> permgen) >>>>>> [3]: http://s.apache.org/NT0 >>>>>> >>>>>> >>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth < >>>>>> cnaur...@hortonworks.com> wrote: >>>>>> >>>>>>> Sure, thanks Sean! Do we just look in the dev-support folder in the >>>>>>> HBase >>>>>>> repo? Is there any additional context we need to be aware of? >>>>>>> >>>>>>> Chris Nauroth >>>>>>> Hortonworks >>>>>>> http://hortonworks.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com> wrote: >>>>>>> >>>>>>> >+dev@hbase >>>>>>> > >>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs to >>>>>>> make >>>>>>> >them >>>>>>> >more robust. From what I can tell our stuff started off as an >>>>>>> earlier >>>>>>> >version of what Hadoop uses for testing. >>>>>>> > >>>>>>> >Folks on either side open to an experiment of combining our >>>>>>> precommit >>>>>>> >check >>>>>>> >tooling? In principle we should be looking for the same kinds of >>>>>>> things. >>>>>>> > >>>>>>> >Naturally we'll still need different jenkins jobs to handle >>>>>>> different >>>>>>> >resource needs and we'd need to figure out where stuff eventually >>>>>>> lives, >>>>>>> >but that could come later. >>>>>>> > >>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth < >>>>>>> cnaur...@hortonworks.com> >>>>>>> >wrote: >>>>>>> > >>>>>>> >> The only thing I'm aware of is the failOnError option: >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro >>>>>>> >>rs >>>>>>> >> .html >>>>>>> >> >>>>>>> >> >>>>>>> >> I prefer that we don't disable this, because ignoring different >>>>>>> kinds of >>>>>>> >> failures could leave our build directories in an indeterminate >>>>>>> state. >>>>>>> >>For >>>>>>> >> example, we could end up with an old class file on the classpath >>>>>>> for >>>>>>> >>test >>>>>>> >> runs that was supposedly deleted. >>>>>>> >> >>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating >>>>>>> failure >>>>>>> >> by placing a file where the code expects to see a directory. >>>>>>> That might >>>>>>> >> even let us enable some of these tests that are skipped on >>>>>>> Windows, >>>>>>> >> because Windows allows access for the owner even after >>>>>>> permissions have >>>>>>> >> been stripped. >>>>>>> >> >>>>>>> >> Chris Nauroth >>>>>>> >> Hortonworks >>>>>>> >> http://hortonworks.com/ >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> >Is there a maven plugin or setting we can use to simply remove >>>>>>> >> >directories that have no executable permissions on them? >>>>>>> Clearly we >>>>>>> >> >have the permission to do this from a technical point of view >>>>>>> (since >>>>>>> >> >we created the directories as the jenkins user), it's simply >>>>>>> that the >>>>>>> >> >code refuses to do it. >>>>>>> >> > >>>>>>> >> >Otherwise I guess we can just fix those tests... >>>>>>> >> > >>>>>>> >> >Colin >>>>>>> >> > >>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> >>>>>>> wrote: >>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris. >>>>>>> >> >> >>>>>>> >> >> In HDFS-7722: >>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions >>>>>>> in >>>>>>> >> >>TearDown(). >>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally >>>>>>> clause. >>>>>>> >> >> >>>>>>> >> >> Also I ran mvn test several times on my machine and all tests >>>>>>> passed. >>>>>>> >> >> >>>>>>> >> >> However, since in DiskChecker#checkDirAccess(): >>>>>>> >> >> >>>>>>> >> >> private static void checkDirAccess(File dir) throws >>>>>>> >>DiskErrorException { >>>>>>> >> >> if (!dir.isDirectory()) { >>>>>>> >> >> throw new DiskErrorException("Not a directory: " >>>>>>> >> >> + dir.toString()); >>>>>>> >> >> } >>>>>>> >> >> >>>>>>> >> >> checkAccessByFileMethods(dir); >>>>>>> >> >> } >>>>>>> >> >> >>>>>>> >> >> One potentially safer alternative is replacing data dir with a >>>>>>> >>regular >>>>>>> >> >> file to stimulate disk failures. >>>>>>> >> >> >>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth >>>>>>> >> >><cnaur...@hortonworks.com> wrote: >>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and >>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable >>>>>>> >>permissions >>>>>>> >> >>>from >>>>>>> >> >>> directories like the one Colin mentioned to simulate disk >>>>>>> failures >>>>>>> >>at >>>>>>> >> >>>data >>>>>>> >> >>> nodes. I reviewed the code for all of those, and they all >>>>>>> appear >>>>>>> >>to be >>>>>>> >> >>> doing the necessary work to restore executable permissions at >>>>>>> the >>>>>>> >>end >>>>>>> >> >>>of >>>>>>> >> >>> the test. The only recent uncommitted patch I¹ve seen that >>>>>>> makes >>>>>>> >> >>>changes >>>>>>> >> >>> in these test suites is HDFS-7722. That patch still looks >>>>>>> fine >>>>>>> >> >>>though. I >>>>>>> >> >>> don¹t know if there are other uncommitted patches that >>>>>>> changed these >>>>>>> >> >>>test >>>>>>> >> >>> suites. >>>>>>> >> >>> >>>>>>> >> >>> I suppose it¹s also possible that the JUnit process >>>>>>> unexpectedly >>>>>>> >>died >>>>>>> >> >>> after removing executable permissions but before restoring >>>>>>> them. >>>>>>> >>That >>>>>>> >> >>> always would have been a weakness of these test suites, >>>>>>> regardless >>>>>>> >>of >>>>>>> >> >>>any >>>>>>> >> >>> recent changes. >>>>>>> >> >>> >>>>>>> >> >>> Chris Nauroth >>>>>>> >> >>> Hortonworks >>>>>>> >> >>> http://hortonworks.com/ >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> >>>>>>> wrote: >>>>>>> >> >>> >>>>>>> >> >>>>Hey Colin, >>>>>>> >> >>>> >>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's >>>>>>> going on >>>>>>> >>with >>>>>>> >> >>>>these boxes. He took a look and concluded that some perms are >>>>>>> being >>>>>>> >> >>>>set in >>>>>>> >> >>>>those directories by our unit tests which are precluding >>>>>>> those files >>>>>>> >> >>>>from >>>>>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but >>>>>>> we >>>>>>> >>should >>>>>>> >> >>>>expect this to keep happening until we can fix the test in >>>>>>> question >>>>>>> >>to >>>>>>> >> >>>>properly clean up after itself. >>>>>>> >> >>>> >>>>>>> >> >>>>To help narrow down which commit it was that started this, >>>>>>> Andrew >>>>>>> >>sent >>>>>>> >> >>>>me >>>>>>> >> >>>>this info: >>>>>>> >> >>>> >>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >>>>>>> >> >>>>>>> >>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>>>> >>>>>>/ >>>>>>> >> >>>>has >>>>>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way >>>>>>> since >>>>>>> >>9:32 >>>>>>> >> >>>>UTC >>>>>>> >> >>>>on March 5th." >>>>>>> >> >>>> >>>>>>> >> >>>>-- >>>>>>> >> >>>>Aaron T. Myers >>>>>>> >> >>>>Software Engineer, Cloudera >>>>>>> >> >>>> >>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe >>>>>>> >><cmcc...@apache.org> >>>>>>> >> >>>>wrote: >>>>>>> >> >>>> >>>>>>> >> >>>>> Hi all, >>>>>>> >> >>>>> >>>>>>> >> >>>>> A very quick (and not thorough) survey shows that I can't >>>>>>> find any >>>>>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours. Most >>>>>>> of them >>>>>>> >> >>>>>seem >>>>>>> >> >>>>> to be failing with some variant of this message: >>>>>>> >> >>>>> >>>>>>> >> >>>>> [ERROR] Failed to execute goal >>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >>>>>>> >>(default-clean) >>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to >>>>>>> delete >>>>>>> >> >>>>> >>>>>>> >> >>>>> >>>>>>> >> >>>>>>> >>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd >>>>>>> >>>>>>>fs >>>>>>> >> >>>>>-pr >>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>>>> >> >>>>> -> [Help 1] >>>>>>> >> >>>>> >>>>>>> >> >>>>> Any ideas how this happened? Bad disk, unit test setting >>>>>>> wrong >>>>>>> >> >>>>> permissions? >>>>>>> >> >>>>> >>>>>>> >> >>>>> Colin >>>>>>> >> >>>>> >>>>>>> >> >>> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> -- >>>>>>> >> >> Lei (Eddy) Xu >>>>>>> >> >> Software Engineer, Cloudera >>>>>>> >> >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> >-- >>>>>>> >Sean >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sean >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sean >>>>> >>>> >>>> >>>> >>>> -- >>>> Sean >>>> >>> >>> >>> >>> -- >>> Sean >>> >> >> >> >> -- >> Sean >> > > > > -- > Sean > -- Sean