Re: What will the next generation of bigtop look like?

Evans Ye Sat, 13 Dec 2014 10:21:06 -0800

Please allow me to chime in.
Here's a real story brings another aspect that we probably can head toward.
My company recently is going to upgrade our hadoop version. Here we got
several distribution on the table to be chosen: CDH, HDP, BigTop. But it is
a
fact that BigTop is unlikely to be the one for decision maker to choose
because
its version set is a little bit too old compared to the others.
My point is maybe one thing we can do is to release often, release faster.
And there're things we can do to achieve this goal:
1) shrink the supported components and focus on the vital parts
    (as this thread already mentioned)
2) establish an comprehensive auto testing system(like smoke tests) that
    supports us to do fast release
3) Instead of cutting off components, maybe we can upgrade major components
    more often than minors. For example in 0.8.1, we can only have Hadoop,
    Spark,HBase,Kafka,Solr upgraded. After all, I believe most of the
    companies running Hadoop cluster with limited core components.
    Hence the lag of minors might not be a big problem.


The quick release cycle not only makes BigTop more attractive but also
gives the community a vivid image. I think that is the crucial part for
community to keep growing.

2014-12-09 4:23 GMT+08:00 Konstantin Boudnik <[email protected]>:
>
> First I want to address the RJ's question:
>
> The most prominent downstream Bigtop Dependency would be any commercial
> Hadoop distribution like HDP and CDH. The former is trying to
> disguise their affiliation by pushing Ambari forward, and Cloudera's
> seemingly
> shifting her focus to compressed tarballs media (aka parcels) which
> requires
> a closed-source solutions like Cloudera Manager to deploy and control your
> cluster, effectively rendering it useless if you ever decide to uninstall
> the
> control software. In the interest of full disclosure, I don't think parcels
> have any chance to landslide the consensus in the industry from Linux
> packaging towards something so obscure and proprietary as parcels are.
>
>
> And now to my actual points....:
>
> I do strongly believe the Bigtop was and is the only completely
> transparent,
> vendors' friendly, and 100% sticking to official ASF product releases way
> of
> building your stack from ground up, deploying and controlling it anyway you
> want to. I agree with Roman's presentation on how this project can move
> forward. However, I somewhat disagree with his view on the perspectives. It
> might be a hard road to drive the opinion of the community.  But, it is a
> high
> road.
>
> We are definitely small and mostly unsupported by commercial groups that
> are
> using the framework. Being a box of LEGO won't win us anything. If
> anything,
> the empirical evidences are against it as commercial distros have decided
> to
> move towards their own means of "vendor lock-in" (yes, you hear me
> right - that's exactly what I said: all so called open-source companies
> have
> invented a way to lock-in their customers either with fancy "enterprise
> features" that aren't adding but amending underlying stack; or with custom
> set
> of patches oftentimes rendering the cluster to become incompatible between
> different vendors).
>
> By all means, my money are on the second way, yet slightly modified (as
> use-cases are coming from users, not developers):
>   #2 start driving adoption of software stacks for the particular kind of
> data workloads
>
> This community has enough day-to-day practitioners on board to
> accumulate a near-complete introspection of where the technology is moving.
> And instead of wobbling in a backwash, let's see if we can be smart and
> define
> this landscape. After all, Bigtop has adopted Spark well before any of the
> commercials have officially accepted it. We seemingly are moving more and
> more into in-memory realm of data processing: Apache Ignite (Gridgain),
> Tachyon, Spark. I don't know how much legs Hive got in it, but I am
> doubtful,
> that it can walk for much longer... May be it's just me.
>
> In this thread http://is.gd/MV2BH9 we already discussed some of the
> aspects
> influencing the feature of this project. And we are de-facto working on the
> implementation. In my opinion, Hadoop has been more or less commoditized
> already. And it isn't a bad thing, but it means that the innovations are
> elsewhere. E.g. Spark moving is moving beyond its ties with storage layer
> via
> Tachyon abstraction; GridGain simply doesn't care what's underlying storage
> is. However, data needs to be stored somewhere before it can be processed.
> And
> HCFS seems to be fitting the bill ok. But, as I said already, I see the
> real
> action elsewhere. If I were to define the shape of our mid- to long'ish
> term
> roadmap it'd be something like that:
>
>             ^   Dashboard/Visualization  ^
>             |     OLTP/ML processing     |
>             |    Caching/Acceleration    |
>             |         Storage            |
>
> And around this we can add/improve on deployment (R8???),
> virtualization/containers/clouds.  In other words - let's focus on the
> vertical part of the stack, instead of simply supporting the status quo.
>
> Does Cassandra fits the Storage layer in that model? I don't know and most
> important - I don't care. If there's an interest and manpower to have
> Cassandra-based stack - sure, but perhaps let's do as a separate branch or
> something, so we aren't over-complicating things. As Roman said earlier, in
> this case it'd be great to engage Cassandra/DataStax people into this
> project.
> But something tells me they won't be eager to jump on board.
>
> And finally, all this above leads to "how": how we can start reshaping the
> stack into its next incarnation? Perhaps, Ubuntu model might be an answer
> for
> that, but we have discussed that elsewhere and dropped the idea as it
> wasn't
> feasible back in the day. Perhaps its time just came?
>
> Apologies for a long post.
>   Cos
>
>
> On Sun, Dec 07, 2014 at 07:04PM, RJ Nowling wrote:
> > Which other projects depend on BigTop?  How will the questions about the
> > direction of BigTop affect those projects?
> >
> > On Sun, Dec 7, 2014 at 6:10 PM, Roman Shaposhnik <[email protected]>
> > wrote:
> >
> > > Hi!
> > >
> > > On Sat, Dec 6, 2014 at 3:23 PM, jay vyas <[email protected]>
> > > wrote:
> > > > hi bigtop !
> > > >
> > > > I thought id start a thread a few vaguely related thoughts i have
> around
> > > > next couple iterations of bigtop.
> > >
> > > I think in general I see two major ways for something like
> > > Bigtop to evolve:
> > >    #1 remain a 'box of LEGO bricks' with very little opinion on
> > >         how these pieces need to be integrated
> > >    #2 start driving oppinioned use-cases for the particular kind of
> > >         bigdata workloads
> > >
> > > #1 is sort of what all of the Linux distros have been doing for
> > > the majority of time they existed. #2 is close to what CentOS
> > > is doing with SIGs.
> > >
> > > Honestly, given the size of our community so far and a total
> > > lack of corporate backing (with a small exception of Cloudera
> > > still paying for our EC2 time) I think #1 is all we can do. I'd
> > > love to be wrong, though.
> > >
> > > > 1) Hive:  How will bigtop to evolve to support it, now that it is
> much
> > > more
> > > > than a mapreduce query wrapper?
> > >
> > > I think Hive will remain a big part of Hadoop workloads for forseeable
> > > future. What I'd love to see more of is rationalizing things like how
> > > HCatalog, etc. need to be deployed.
> > >
> > > > 2) I wonder wether we should confirm cassandra interoperability of
> spark
> > > in
> > > > bigtop distros,
> > >
> > > Only if there's a significant interest from cassandra community and
> even
> > > then my biggest fear is that with cassandra we're totally changing the
> > > requirements for the underlying storage subsystem (nothing wrong with
> > > that, its just that in Hadoop ecosystem everything assumes very
> HDFS'ish
> > > requirements for the scale-out storage).
> > >
> > > > 4) in general, i think bigtop can move in one of 3 directions.
> > > >
> > > >   EXPAND ? : Expanding to include new components, with just basic
> > > interop,
> > > > and let folks evolve their own stacks on top of bigtop on their own.
> > > >
> > > >   CONTRACT+FOCUS ?  Contracting to focus on a lean set of core
> > > components,
> > > > with super high quality.
> > > >
> > > >   STAY THE COURSE ? Staying the same ~ a packaging platform for just
> > > > hadoop's direct ecosystem.
> > > >
> > > > I am intrigued by the idea of A and B both have clear benefits and
> > > costs...
> > > > would like to see the opinions of folks --- do we  lean in one
> direction
> > > or
> > > > another? What is the criteria for adding a new feature, package,
> stack to
> > > > bigtop?
> > > >
> > > > ... Or maybe im just overthinking it and should be spending this time
> > > > testing spark for 0.9 release....
> > >
> > > I'd love to know what other think, but for 0.9 I'd rather stay the
> course.
> > >
> > > Thanks,
> > > Roman.
> > >
> > > P.S. There are also market forces at play that may fundamentally change
> > > the focus of what we're all working on in the year or so.
> > >
>

Re: What will the next generation of bigtop look like?

Reply via email to