Hi Roman,

Although your email is pretty big but its really very informative. I went
through Bigtop's website, wiki, archives and other links. I have below
suggestions. I think adding these points will help newbies who wish to
contribute to the project:

1) It would be good if we can add more details into how a person can
contribute to the project, how can he get started and what are the areas
where he can start contributing right now.
2) I think adding a *TODO* page is a better idea. In this page we can list
the things that need to be done. It can start with the simple items and
move towards more complex one. This will help new contributors to pick an
item and work on it.

To start with I have above two suggestions. Now I am planning to contribute
to Bigtop project. Can you please let me know the current *TODOS* or how
should I start contributing into the project?

Thanks in advance.
Shubham


On Sat, May 18, 2013 at 4:59 AM, Roman Shaposhnik <[email protected]> wrote:

> Guys, this is a pretty long email with all the details
> I can think of on how Bigtop can help stabilization efforts of
> Hadoop 2.x. A lot of this information is required background.
> I really, really encourage everyone who's thinking of
> contributing to this effort to read it up. Once again,
> I do apologize for its size.
>
> Matt, Andrew,
>
> you both brought up very good point, so let me summarize
> a few things wrt. Bigtop. I'm also CCing Bigtop dev ML
> so that everybody who's interested in pitching in could
> discuss the matter further over there.
>
> On Wed, May 15, 2013 at 9:25 PM, Andrew Purtell <[email protected]>
> wrote:
> > The other comment on this thread that suggests ASF governance structures
> > being inadequate for negotiating changes in a large ecosystem might be
> on to
> > something, but at the same time Apache BigTop may be an effective
> ASF-native
> > answer to that.
>
> That is my sincere hope as well. Of course, Apache Bigtop is a project in
> its
> own right with its own release schedules, community of users, etc. What we
> are developing is not really an integration testsuite for Hadoop, it
> just so happens
> that without a stable Hadoop base we can't really deliver much. Hence we
> have a huge vested interest in having a predictable schedule for the stable
> releases of Hadoop. We also have all the interest in the world to help
> Hadoop
> achieve that.
>
> At the same time we're a very small project juggling ~18 different open
> source
> components trying to put them into a coherent distribution. I don't think
> it is
> realistic to expect us to be able to do all the work that ideally we would
> need
> to do in order to provide the most of feedback for Hadoop
> stabilization exercise.
>
> At the same time it would be really unfortunate if we all just give up on
> this
> collective goal. Ideally we can all pitch in to the extent we believe in
> the
> need in having a stable Hadoop 2.x code line out there. I'll elaborate on
> what exactly bigtop can contribute a bit later and I would expect all the
> folks who'd be willing to pitch in in the particular area to reach out to
> us
> either here or on bigtop ML.
>
> On Wed, May 15, 2013 at 4:54 PM, Matt Foley <[email protected]> wrote:
> > Roman, what is your model for how test results from Bigtop should feed
> back
> > into Hadoop-2 development?
> > With the understanding that (a) software does have bugs, and (b) you're
> not
> > going to get an SLA on community-sponsored software,
> > what are your ideas for how to close the loop better?
> >
> > Would "CI" runs of Bigtop against branch-2 be feasible, as Arun suggests?
> > How should we accomodate changes in individual components (Hadoop Core,
> but
> > others as well) that may require changes in one or more other components?
> > How does Bigtop keep doing a viable nightly build in that chaotic
> > environment?
> > Is this a previously solved problem?
>
> All excellent questions! Here's my laundry list of what Bigtop can offer
> today:
>     #0 a publically available continuous integration Jenkins instance that
>          runs on EC2 (because of Cloudera's gracious support of our
> project)
>          and ties the rest of the bigtop infrastrucutre together:
>              http://bigtop01.cloudera.org:8080/
>
>          The benefit of this infrastructure in the open is pretty clear --
> just
>           like with builds.apache.org if there are failures/etc. anybody
> who's
>           interested can jump on it and start making progress.
>
>     #1 a continuous integration build of all the components comprising the
>          'current' trunk of Apache Bigtop all the way up to producing easy
> to
>          install packages for the following Linux platforms:
>
> http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/
>          Basically the above link allows one to install nightly builds
> of Apache Bigtop
>          Hadoop distribution as easyly as typing 'yum install
> hadoop-conf-pseudo'
>
>     #2 a potential for  'tracking' builds all the way to packages of
> each individual
>          component: http://bigtop01.cloudera.org:8080/view/Upstream-tests/
>          Basically this allows one to easily install the base, fully
> tested distribution
>          of Hadoop (lets say Bigtop 0.5.0), upgrade just one component
> and see how
>          it fares. Currently these builds are add-hoc, but I'm trying
> to work with respective
>          upstream communities to figure out what branches of
> development they would
>          be interested in testing that way.
>
>          This is one of the things that Arun and I talked about wrt.
> hooking up Bigtop
>          Jenkins to the branch-2 on a continuous basis. I wish I had
> time to do that
>          I honestly simply don't. I might in a few weeks, but again,
> if anybody is willing
>          to pitch in and help -- that'll be greatly appreciated.
>
>     #3 a collection of puppet recipes that allow one to deploy
> packaged Bigtop distro
>          (either from #1 or #2) on a fully distributed cluster.
>
>     #4 an existing collection of integration tests (~200) for all the
> components
>          we've got in our stack: http://s.apache.org/UX8
>
>     #5 a continuous integration Jenkins jobs that deploy our trunk
> builds on a nightly
>          basis in 2 configurations: secure and unsecure one over 4
> fully distributed nodes
>          running as EC2 VMs:
>               http://bigtop01.cloudera.org:8080/view/Deployment/
>
>     #6 a continuous (although currently disabled) nightly runs of all
> the tests from #4
>          on two clusters deployed as part of #5. E.g.:
>
>
> http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-smokecluster/14/testReport/
>
> At this point we barely have resources to cover the minimum it takes
> to maintain #1-#6. Here are the areas where we have gaps in coverage
> (especially in the context of Hadoop 2.x):
>      * there are currently no unit tests in ASF of all the Hadoop ecosystem
>        projects running against a full transitive closure of the
> Bigtop components.
>        This is actually pretty tricky to accomplish on ASF
> infrastructure since it
>        requires a combinatoric explosion of the # of Maven artifacts
> that get published.
>
>      * more close to the point of this thread there are very few
> Hadoop ecosystem
>        projects that currently run unit tests against Hadoop 2.x
>
>      * our integration tests  (#4) could be greatly improved upon (what
> tests
>        couldn't!) and we really would like to have at least 10x of the
> current
>        amount to feel more comfortable.
>
>     * it would be awesome to integrate with the rest of the system-level
> tests
>       that may be available in the community. The model citizen in that
> respect
>       is Apache Pig where they made their tests flexible enough so that we
>       can run a subset of them against a real cluster:
>
> http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-smokecluster/14/testReport/org.apache.pig.test.pigunit/
>       I wish more project came with tests like that out of the box.
> One particularly
>       sore point in that respect for me is all of the HBase
> integration tests that
>       now exist for HBase 0.95. It has been on my list of things to
> start running
>       on Bigtop infrastructure, but I honestly haven't had a shred of
> time to make
>       it happen.
>
>     * if we start making progress on things above we will definitely
> run into issues
>       of not having enough eyeballs to even triage the issues coming
> out of these
>       test runs.
>
> That's about what I have on my wish list. Feel free to add to it. Also,
> feel
> free to pitch in to help on any of the issues.
>
> I don't think I have anything more to add to this thread. I'll wait to
> hear back
> from those of you who are interested in helping.
>
> Thanks,
> Roman.
>

Reply via email to