Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Josh Elser Mon, 15 Jun 2015 20:58:37 -0700

+1

(Have been talking to Sean in private on the subject -- seemsappropriate to voice some public support)

I'd be interested in this for Accumulo and Slider. For Accumulo, we'vecome a far way without a pre-commit build, primarily due to a CTRprocess. We have seen the repeated questions of "how do I run the tests"which a more automated workflow would help with, IMO. I think Slidercould benefit with the same reasons.

I'd also be giddy to see the recent improvements in Hadoop trickle downinto the other projects that Allen already mentioned.


Take this as record that I'd be happy to try to help out where possible.

Sean Busbey wrote:

thank you for making a more digestible version Allen. :)

If you're interested in soliciting feedback from other projects, I created
ASF short links to this thread in common-dev and hbase:


* http://s.apache.org/yetus-discuss-hadoop
* http://s.apache.org/yetus-discuss-hbase

While I agree that it's important to get feedback from ASF projects that
might find this useful, I can say that recently I've been involved in the
non-ASF project YCSB and both the pretest and better shell stuff would be
immensely useful over there.

On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer<a...@altiscale.com>  wrote:

         I'm clearly +1 on this idea.  As part of the rewrite in Hadoop of
test-patch, it was amazing to see how far and wide this bit of code as
spread.  So I see consolidating everyone's efforts as a huge win for a
large number of projects.  (esp considering how many I saw suffering from a
variety of identified bugs! )

         But….

         I think it's important for people involved in those other projects
to speak up and voice an opinion as to whether this is useful.

To summarize:

         In the short term, a single location to get/use a precommit patch
tester rather than everyone building/supporting their own in their spare
time.

          FWIW, we've already got the code base modified to be pluggable.
We've written some basic/simple plugins that support Hadoop, HBase, Tajo,
Tez, Pig, and Flink.  For HBase and Flink, this does include their custom
checks.  Adding support for other project shouldn't be hard.  Simple
projects take almost no time after seeing the basic pattern.

         I think it's worthwhile highlighting that means support for both
JIRA and GitHub as well as Ant and Maven from the same code base.

Longer term:

         Well, we clearly have ideas of things that we want to do. Adding
more features to test-patch (review board? gradle?) is obvious. But what
about teasing apart and generalizing some of the other shell bits from
projects? A common library for building CLI tools to fault injection to
release documentation creation tools to …  I'd even like to see us get as
advanced as a "run this program to auto-generate daemon stop/start bits".

         I had a few chats with people about this idea at Hadoop Summit.
What's truly exciting are the ideas that people had once they realized what
kinds of problems we're trying to solve.  It's always amazing the problems
that projects have that could be solved by these types of solutions.  Let's
stop hiding our cool toys in this area.

         So, what feedback and ideas do you have in this area?  Are you a
yay or a nay?


On Jun 15, 2015, at 4:47 PM, Sean Busbey<bus...@cloudera.com>  wrote:

Oof. I had meant to push on this again but life got in the way and now

the

June board meeting is upon us. Sorry everyone. In the event that this

ends

up contentious, hopefully one of the copied communities can give us a
branch to work in.

I know everyone is busy, so here's the short version of this email: I'd
like to move some of the code currently in Hadoop (test-patch) into a new
TLP focused on QA tooling. I'm not sure what the best format for priming
this conversation is. ORC filled in the incubator project proposal
template, but I'm not sure how much that confused the issue. So to start,
I'll just write what I'm hoping we can accomplish in general terms here.

All software development projects that are community based (that is,
accepting outside contributions) face a common QA problem for vetting
in-coming contributions. Hadoop is fortunate enough to be sufficiently
popular that the weight of the problem drove tool development (i.e.
test-patch). That tool is generalizable enough that a bunch of other TLPs
have adopted their own forks. Unfortunately, in most projects this kind

of

QA work is an enabler rather than a primary concern, so often the tooling
is worked on ad-hoc and little shared improvements happen across
projects. Since
the tooling itself is never a primary concern, any made is rarely reused
outside of ASF projects.

Over the last couple months a few of us have been working on generalizing
the tooling present in the Hadoop code base (because it was the most

mature

out of all those in the various projects) and it's reached a point where

we

think we can start bringing on other downstream users. This means we need
to start establishing things like a release cadence and to grow the new
contributors we have to handle more project responsibility. Personally, I
think that means it's time to move out from under Hadoop to drive things

as

our own community. Eventually, I hope the community can help draw in a
group of folks traditionally underrepresented in ASF projects, namely QA
and operations folks.

I think test-patch by itself has enough scope to justify a project.

Having

a solid set of build tools that are customizable to fit the norms of
different software communities is a bunch of work. Making it work well in
both the context of automated test systems like Jenkins and for

individual

developers is even more work. We could easily also take over maintenance

of

things like shelldocs, since test-patch is the primary consumer of that
currently but it's generally useful tooling.

In addition to test-patch, I think the proposed project has some future
growth potential. Given some adoption of test-patch to prove utility, the
project could build on the ties it makes to start building tools to help
projects do their own longer-run testing. Note that I'm talking about the
tools to build QA processes and not a particular set of tested

components.

Specifically, I think the ChaosMonkey work that's in HBase should be
generalizable as a fault injection framework (either based on that code

or

something like it). Doing this for arbitrary software is obviously very
difficult, and a part of easing that will be to make (and then favor)
tooling to allow projects to have operational glue that looks the same.
Namely, the shell work that's been done in hadoop-functions.sh would be a
great foundational layer that could bring good daemon handling practices

to

a whole slew of software projects. In the event that these frameworks and
tools get adopted by parts of the Hadoop ecosystem, that could make the

job

of i.e. Bigtop substantially easier.

I've reached out to a few folks who have been involved in the current
test-patch work or expressed interest in helping out on getting it used

in

other projects. Right now, the proposed PMC would be (alphabetical by

last

name):

* Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
pmc, sqoop pmc, all around Jenkins expert)
* Sean Busbey (ASF member, accumulo pmc, hbase pmc)
* Nick Dimiduk (hbase pmc, phoenix pmc)
* Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
* Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
phoenix pmc)
* Allen Wittenauer (hadoop committer)

That PMC gives us several members and a bunch of folks familiar with the
ASF. Combined with the code already existing in Apache spaces, I think

that

gives us sufficient justification for a direct board proposal.

The planned project name is "Apache Yetus". It's an archaic genus of sea
snail and most of our project will be focused on shell scripts.

N.b.: this does not mean that the Hadoop community would _have_ to rely

on

the new TLP, but I hope that once we have a release that can be evaluated
there'd be enough benefit to strongly encourage it.

This has mostly been focused on scope and community issues, and I'd love

to

talk through any feedback on that. Additionally, are there any other

points

folks want to make sure are covered before we have a resolution?

On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey<bus...@cloudera.com>

wrote:

Sorry for the resend. I figured this deserves a [DISCUSS] flag.



On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey<bus...@cloudera.com>

wrote:

Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature

enough

code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey<bus...@cloudera.com>

wrote:

HBase's dev-support folder is where the scripts and support files

live.

We've only recently started adding anything to the maven builds that's
specific to jenkins[1]; so far it's diagnostic stuff, but that's

where I'd

add in more if we ran into the same permissions problems y'all are

having.

There's also our precommit job itself, though it isn't large[2].

AFAIK,

we don't properly back this up anywhere, we just notify each other of
changes on a particular mail thread[3].

[1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
[2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're

all

read because I just finished fixing "mvn site" running out of permgen)
[3]: http://s.apache.org/NT0


On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth<

cnaur...@hortonworks.com

wrote:
Sure, thanks Sean!  Do we just look in the dev-support folder in the
HBase
repo?  Is there any additional context we need to be aware of?

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:44 PM, "Sean Busbey"<bus...@cloudera.com>  wrote:

+dev@hbase

HBase has recently been cleaning up our precommit jenkins jobs to

make

them
more robust. From what I can tell our stuff started off as an

earlier

version of what Hadoop uses for testing.

Folks on either side open to an experiment of combining our

precommit

check
tooling? In principle we should be looking for the same kinds of

things.

Naturally we'll still need different jenkins jobs to handle

different

resource needs and we'd need to figure out where stuff eventually

lives,

but that could come later.

On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth<

cnaur...@hortonworks.com>

wrote:

The only thing I'm aware of is the failOnError option:

http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro

rs
.html


I prefer that we don't disable this, because ignoring different

kinds of

failures could leave our build directories in an indeterminate

state.

For
example, we could end up with an old class file on the classpath

for

test
runs that was supposedly deleted.

I think it's worth exploring Eddy's suggestion to try simulating

failure

by placing a file where the code expects to see a directory.  That

might

even let us enable some of these tests that are skipped on Windows,
because Windows allows access for the owner even after permissions

have

been stripped.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/11/15, 2:10 PM, "Colin McCabe"<cmcc...@alumni.cmu.edu>

wrote:

Is there a maven plugin or setting we can use to simply remove
directories that have no executable permissions on them?  Clearly

we

have the permission to do this from a technical point of view

(since

we created the directories as the jenkins user), it's simply that

the

code refuses to do it.

Otherwise I guess we can just fix those tests...

Colin

On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu<l...@cloudera.com>  wrote:

Thanks a lot for looking into HDFS-7722, Chris.

In HDFS-7722:
TestDataNodeVolumeFailureXXX tests reset data dir permissions in
TearDown().
TestDataNodeHotSwapVolumes reset permissions in a finally clause.

Also I ran mvn test several times on my machine and all tests

passed.

However, since in DiskChecker#checkDirAccess():

private static void checkDirAccess(File dir) throws

DiskErrorException {

  if (!dir.isDirectory()) {
    throw new DiskErrorException("Not a directory: "
                                 + dir.toString());
  }

  checkAccessByFileMethods(dir);
}

One potentially safer alternative is replacing data dir with a

regular

file to stimulate disk failures.

On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
<cnaur...@hortonworks.com>  wrote:

TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
TestDataNodeVolumeFailureReporting, and
TestDataNodeVolumeFailureToleration all remove executable

permissions

from
directories like the one Colin mentioned to simulate disk

failures

at

data
nodes.  I reviewed the code for all of those, and they all

appear

to be

doing the necessary work to restore executable permissions at

the

end

of
the test.  The only recent uncommitted patch I¹ve seen that

makes

changes
in these test suites is HDFS-7722.  That patch still looks fine
though.  I
don¹t know if there are other uncommitted patches that changed

these

test
suites.

I suppose it¹s also possible that the JUnit process unexpectedly

died

after removing executable permissions but before restoring them.

That

always would have been a weakness of these test suites,

regardless

of

any
recent changes.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/10/15, 1:47 PM, "Aaron T. Myers"<a...@cloudera.com>  wrote:

Hey Colin,

I asked Andrew Bayer, who works with Apache Infra, what's going

on

with

these boxes. He took a look and concluded that some perms are

being

set in
those directories by our unit tests which are precluding those

files

from
getting deleted. He's going to clean up the boxes for us, but

we

should

expect this to keep happening until we can fix the test in

question

to

properly clean up after itself.

To help narrow down which commit it was that started this,

Andrew

sent

me
this info:

"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-

Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3

/
has
500 perms, so I'm guessing that's the problem. Been that way

since

9:32

UTC
on March 5th."

--
Aaron T. Myers
Software Engineer, Cloudera

On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe

<cmcc...@apache.org>

wrote:

Hi all,

A very quick (and not thorough) survey shows that I can't find

any

jenkins jobs that succeeded from the last 24 hours.  Most of

them

seem
to be failing with some variant of this message:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-clean-plugin:2.5:clean

(default-clean)

on project hadoop-hdfs: Failed to clean project: Failed to

delete

/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd

fs
-pr
oject/hadoop-hdfs/target/test/data/dfs/data/data3
->  [Help 1]

Any ideas how this happened?  Bad disk, unit test setting

wrong

permissions?

Colin



--
Lei (Eddy) Xu
Software Engineer, Cloudera


--
Sean


--
Sean



--
Sean



--
Sean



--
Sean

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

Reply via email to