Re: What will the next generation of bigtop look like?

Andre Arcilla Wed, 10 Dec 2014 01:15:07 -0800

I will be juggling several projects beginning of Jan, and can only arrange
hosting (and attend a meetup) starting the last week of Jan (1/25 and on).
I also notice that Jay suggested "a meetup after January". Anyone for the
last week of Jan?



On Tue, Dec 9, 2014 at 11:39 PM, Konstantin Boudnik <[email protected]> wrote:

> On Mon, Dec 08, 2014 at 09:16PM, Konstantin Boudnik wrote:
> > On Mon, Dec 08, 2014 at 11:57PM, Jay Vyas wrote:
> > > "Let's see if we can be smart and define the landscape"
> > >
> > > Well put @cos...I think Romans point was that it would be hard, not
> that it
> > > would be bad. And I think you're both right : it's hard? Yes. But
> > > worthwhile... Possibly? Next step we will all have to get in a room and
> > > think about this face to face.
> > >
> > > Let's shoot for a meetup after january in California... Where we can
> plan
> > > the future direction of bigtop.  In the meanwhile hope to hear more
> opinions
> > > on this.
> >
> > +1 I can host at WANdisco or perhaps there other options?
>
> Shall we start putting some arrangements/planning for the January
> meetup? Say, 2nd week or January? Or the following one?
>
> Andre, do you guys want to host it at A9? Anyone else? I am happy to do
> this
> at my office, but it might be a bit of travel, although against the traffic
> both ways.
>
> Cos
>
> > > > On Dec 8, 2014, at 3:23 PM, Konstantin Boudnik <[email protected]>
> wrote:
> > > >
> > > > First I want to address the RJ's question:
> > > >
> > > > The most prominent downstream Bigtop Dependency would be any
> commercial
> > > > Hadoop distribution like HDP and CDH. The former is trying to
> > > > disguise their affiliation by pushing Ambari forward, and Cloudera's
> seemingly
> > > > shifting her focus to compressed tarballs media (aka parcels) which
> requires
> > > > a closed-source solutions like Cloudera Manager to deploy and
> control your
> > > > cluster, effectively rendering it useless if you ever decide to
> uninstall the
> > > > control software. In the interest of full disclosure, I don't think
> parcels
> > > > have any chance to landslide the consensus in the industry from Linux
> > > > packaging towards something so obscure and proprietary as parcels
> are.
> > > >
> > > >
> > > > And now to my actual points....:
> > > >
> > > > I do strongly believe the Bigtop was and is the only completely
> transparent,
> > > > vendors' friendly, and 100% sticking to official ASF product
> releases way of
> > > > building your stack from ground up, deploying and controlling it
> anyway you
> > > > want to. I agree with Roman's presentation on how this project can
> move
> > > > forward. However, I somewhat disagree with his view on the
> perspectives. It
> > > > might be a hard road to drive the opinion of the community.  But, it
> is a high
> > > > road.
> > > >
> > > > We are definitely small and mostly unsupported by commercial groups
> that are
> > > > using the framework. Being a box of LEGO won't win us anything. If
> anything,
> > > > the empirical evidences are against it as commercial distros have
> decided to
> > > > move towards their own means of "vendor lock-in" (yes, you hear me
> > > > right - that's exactly what I said: all so called open-source
> companies have
> > > > invented a way to lock-in their customers either with fancy
> "enterprise
> > > > features" that aren't adding but amending underlying stack; or with
> custom set
> > > > of patches oftentimes rendering the cluster to become incompatible
> between
> > > > different vendors).
> > > >
> > > > By all means, my money are on the second way, yet slightly modified
> (as
> > > > use-cases are coming from users, not developers):
> > > >  #2 start driving adoption of software stacks for the particular
> kind of data workloads
> > > >
> > > > This community has enough day-to-day practitioners on board to
> > > > accumulate a near-complete introspection of where the technology is
> moving.
> > > > And instead of wobbling in a backwash, let's see if we can be smart
> and define
> > > > this landscape. After all, Bigtop has adopted Spark well before any
> of the
> > > > commercials have officially accepted it. We seemingly are moving
> more and
> > > > more into in-memory realm of data processing: Apache Ignite
> (Gridgain),
> > > > Tachyon, Spark. I don't know how much legs Hive got in it, but I am
> doubtful,
> > > > that it can walk for much longer... May be it's just me.
> > > >
> > > > In this thread http://is.gd/MV2BH9 we already discussed some of the
> aspects
> > > > influencing the feature of this project. And we are de-facto working
> on the
> > > > implementation. In my opinion, Hadoop has been more or less
> commoditized
> > > > already. And it isn't a bad thing, but it means that the innovations
> are
> > > > elsewhere. E.g. Spark moving is moving beyond its ties with storage
> layer via
> > > > Tachyon abstraction; GridGain simply doesn't care what's underlying
> storage
> > > > is. However, data needs to be stored somewhere before it can be
> processed. And
> > > > HCFS seems to be fitting the bill ok. But, as I said already, I see
> the real
> > > > action elsewhere. If I were to define the shape of our mid- to
> long'ish term
> > > > roadmap it'd be something like that:
> > > >
> > > >            ^   Dashboard/Visualization  ^
> > > >            |     OLTP/ML processing     |
> > > >            |    Caching/Acceleration    |
> > > >            |         Storage            |
> > > >
> > > > And around this we can add/improve on deployment (R8???),
> > > > virtualization/containers/clouds.  In other words - let's focus on
> the
> > > > vertical part of the stack, instead of simply supporting the status
> quo.
> > > >
> > > > Does Cassandra fits the Storage layer in that model? I don't know
> and most
> > > > important - I don't care. If there's an interest and manpower to have
> > > > Cassandra-based stack - sure, but perhaps let's do as a separate
> branch or
> > > > something, so we aren't over-complicating things. As Roman said
> earlier, in
> > > > this case it'd be great to engage Cassandra/DataStax people into
> this project.
> > > > But something tells me they won't be eager to jump on board.
> > > >
> > > > And finally, all this above leads to "how": how we can start
> reshaping the
> > > > stack into its next incarnation? Perhaps, Ubuntu model might be an
> answer for
> > > > that, but we have discussed that elsewhere and dropped the idea as
> it wasn't
> > > > feasible back in the day. Perhaps its time just came?
> > > >
> > > > Apologies for a long post.
> > > >  Cos
> > > >
> > > >
> > > >> On Sun, Dec 07, 2014 at 07:04PM, RJ Nowling wrote:
> > > >> Which other projects depend on BigTop?  How will the questions
> about the
> > > >> direction of BigTop affect those projects?
> > > >>
> > > >> On Sun, Dec 7, 2014 at 6:10 PM, Roman Shaposhnik <
> [email protected]>
> > > >> wrote:
> > > >>
> > > >>> Hi!
> > > >>>
> > > >>> On Sat, Dec 6, 2014 at 3:23 PM, jay vyas <
> [email protected]>
> > > >>> wrote:
> > > >>>> hi bigtop !
> > > >>>>
> > > >>>> I thought id start a thread a few vaguely related thoughts i have
> around
> > > >>>> next couple iterations of bigtop.
> > > >>>
> > > >>> I think in general I see two major ways for something like
> > > >>> Bigtop to evolve:
> > > >>>   #1 remain a 'box of LEGO bricks' with very little opinion on
> > > >>>        how these pieces need to be integrated
> > > >>>   #2 start driving oppinioned use-cases for the particular kind of
> > > >>>        bigdata workloads
> > > >>>
> > > >>> #1 is sort of what all of the Linux distros have been doing for
> > > >>> the majority of time they existed. #2 is close to what CentOS
> > > >>> is doing with SIGs.
> > > >>>
> > > >>> Honestly, given the size of our community so far and a total
> > > >>> lack of corporate backing (with a small exception of Cloudera
> > > >>> still paying for our EC2 time) I think #1 is all we can do. I'd
> > > >>> love to be wrong, though.
> > > >>>
> > > >>>> 1) Hive:  How will bigtop to evolve to support it, now that it is
> much
> > > >>> more
> > > >>>> than a mapreduce query wrapper?
> > > >>>
> > > >>> I think Hive will remain a big part of Hadoop workloads for
> forseeable
> > > >>> future. What I'd love to see more of is rationalizing things like
> how
> > > >>> HCatalog, etc. need to be deployed.
> > > >>>
> > > >>>> 2) I wonder wether we should confirm cassandra interoperability
> of spark
> > > >>> in
> > > >>>> bigtop distros,
> > > >>>
> > > >>> Only if there's a significant interest from cassandra community
> and even
> > > >>> then my biggest fear is that with cassandra we're totally changing
> the
> > > >>> requirements for the underlying storage subsystem (nothing wrong
> with
> > > >>> that, its just that in Hadoop ecosystem everything assumes very
> HDFS'ish
> > > >>> requirements for the scale-out storage).
> > > >>>
> > > >>>> 4) in general, i think bigtop can move in one of 3 directions.
> > > >>>>
> > > >>>>  EXPAND ? : Expanding to include new components, with just basic
> > > >>> interop,
> > > >>>> and let folks evolve their own stacks on top of bigtop on their
> own.
> > > >>>>
> > > >>>>  CONTRACT+FOCUS ?  Contracting to focus on a lean set of core
> > > >>> components,
> > > >>>> with super high quality.
> > > >>>>
> > > >>>>  STAY THE COURSE ? Staying the same ~ a packaging platform for
> just
> > > >>>> hadoop's direct ecosystem.
> > > >>>>
> > > >>>> I am intrigued by the idea of A and B both have clear benefits and
> > > >>> costs...
> > > >>>> would like to see the opinions of folks --- do we  lean in one
> direction
> > > >>> or
> > > >>>> another? What is the criteria for adding a new feature, package,
> stack to
> > > >>>> bigtop?
> > > >>>>
> > > >>>> ... Or maybe im just overthinking it and should be spending this
> time
> > > >>>> testing spark for 0.9 release....
> > > >>>
> > > >>> I'd love to know what other think, but for 0.9 I'd rather stay the
> course.
> > > >>>
> > > >>> Thanks,
> > > >>> Roman.
> > > >>>
> > > >>> P.S. There are also market forces at play that may fundamentally
> change
> > > >>> the focus of what we're all working on in the year or so.
> > > >>>
>
>
>

Re: What will the next generation of bigtop look like?

Reply via email to