RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Evans Ye Thu, 18 Jun 2015 07:28:28 -0700

Awesome！
Glad to know other people's thought.
I think one of the major thing above those is the documentation. We have
done so many work making bigtop 1.0 a great software. Now we just need to
show people how to use it and what it is capable of doing.

I'll be able to write down some guide in bigtop provisioner and CI infra
stuff. But just want to push 1.0 release forward now with my limited cycle.

I'd also love to see more CI job we can run in our public Jenkins as show
cases.

Somke tests
Integration tests
Puppet recipes
BigpetStore
...

On the other hand, there are stil lots of patch available Jira out there.
We should have 1.0 release soon and get to those nice feature onboard. :)
Building on conversations pre/during/post Apachecon and looking at the post
1.0 bigtop focus and efforts, want to lay out a few things, get peoples
comments.  Seems to be some consensus that the project can look towards
serving end application/data developers more going forward, while
continuing the tradition of the projects build/pkg/test/deploy roots.

I have spent the past couple months, and heavily the past 3 or so weeks,
talking to many different potential end users at meetups, conferences,
etc.., also having some great conversations with commercial open source
vendors that are interested in what a "future bigtop" can be and what it
could provide to users.

I believe we need to put some focused effort into few foundational things
to put the project in a position to move faster and attract a wider range
of users as well as new contributors.

-----------
CI "2.0"
-----------

Start of this is already underway based on the work roman started last year
and continuing effort with new setup and enhancement on bigtop AWS
infrastructure, Evans has been pushing this along into the 1.0 release.
Speed of getting new packages built and up to date needs to increase so
releases can happen at a regular clip.., even looking towards user friendly
"ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc
components they want and have a stack around that.

Related to this, hoping the group can come to some idea/agreement on some
semver style versioning for the project post 1.0.  I think this could set a
path forward for releases that can happen faster, while not holding up the
whole train if a single "smaller" component has a couple issues that
cant/wont be resolved by the main stakeholders or interested parties in
said component.  An example might be new pig or sqoop having issues.., the
1.2 release would still go out the door with 1.2.1 coming days/weeks later
once new pig or sqoop was fixed up.

---------------------------------------------
Proper package repository hosting
---------------------------------------------

I put together a little test setup based on the 0.8 assets, we can probably
build off of that with 1.0, working towards the CI automatically posting
nightly (or just-in-time) builds off latest so people can play around.
Debs/rpms seem should be the focal pt of output for the project assets,
everything else is additive and builds off of that (ie: user who says "I am
not a puppet shop so don’t care about the modules.., but do my own
automation and if you point me to some sane repositories I can do the rest
myself with couple decent getting started steps")

-----------------------------------------------------------------
Greatly increasing the UX and getting started content
-----------------------------------------------------------------

This is the big one.., new website, focused docs and getting started
examples for end users, other specific content for contributors.  I will be
starting to put some cycles into new website jira probably starting next
week, will try to scoot through it and start posting some working examples
for feedback once something basic is in place.  For those interested in
helping out on doc work and getting started content let me know.., looking
at subjects like:

   -Developer getting started
         -using the packages
         -using puppet modules and deployment options
         -deploying reference example stacks
         -setting up your own big data CI
         -etc

   -Contributing to Bigtop:
         -how to submit your first patch/pull-request
         -adding new component (step by step, canned learning component
example, etc)
         -adding tests to an existing component (steps, canned hello world
example test, etc)
         -writing your own test data generator
         -etc

Those are some thoughts and couple initial focal areas that are driving me
around bigtop participation

-----Original Message-----
From: Andrew Purtell [mailto:[email protected]]
Sent: Tuesday, June 16, 2015 12:02 PM
To: [email protected]
Cc: [email protected]
Subject: Re: Rebooting the conversation on the Future of bigtop:
Abstracting the backplane ? Containers?

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one" -> This statement deprecates the
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
and Spark. As a gross generalization it's unlikely to be a helpful
statement in any case.

It's fine if we all have our favorites, of course. I think we're set up
well to empirically determine winners and losers, we don't need to make
partisan statements. Those components that get some user interest in the
form of contributions that keep them building and happy in Bigtop will stay
in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.

On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <[email protected]>
wrote:

> thanks andy - i agree with most of your opinions around continuing to
> build standard packages.. but can you clarify what was offensive ?
> must be a misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms
that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on
> our toes), is that some folks may be interested in hacking around, in
> a separate branch - on some bleeding edge bigdata deployments - which
> attempts to incorporate resource managers and  containers as
> first-class citizens.
>
> Again this is all just ideas - not in any way meant to derail the
> packaging efforts - but rather - just to gauge folks interest level in
> the bleeding edge, docker, mesos, simplified  processing stacks, and so
on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <[email protected]>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already
> > > does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop
> > project, some day, then it will turn off people. I can see where you
> > are coming from, I think. Correct me if I'm wrong: We have limited
> > bandwidth, we should move away from Roman et. al.'s vision of Bigtop
> > as an inclusive distribution of big data packages, and instead
> > become highly opinionated and tightly focused. If that's accurate, I
> > can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may
> > have
> to
> > look at in terms of inclusion - both software and user communities.
> > For example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at
> > the Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas
> > <[email protected]>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about
the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole
> >> hadoop ecosystem, and there is a huge shift to in-memory,
> >> monolithic stacks happening (i.e. gridgain or spark can do what 90%
> >> of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro
> >> which (1) just installed an HCFS implementation (gluster,HDFS,...)
> >> along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark +
> >> ignite
> ]]
> >> --- and then (3) do the integration testing of available
> >> mesos-framework plugins for ignite and spark underneath.  If other
> >> folks are interested, maybe we could create the "1x" or "in-memory"
> >> branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)
> >
>
>
>
> --
> jay vyas
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Reply via email to