Re: What will the next generation of bigtop look like?

Jay Vyas Sat, 13 Dec 2014 14:00:06 -0800

Thanks for the real world user story evans...and mostly in line w the thoughts 
of others I think.


Any other folks, or any disagreements on the idea of a leaner BigTop centered 
around :

{{ HDFS,yarn core, spark, zk, hbase, Kafka, solr, and ignite  }}

> On Dec 13, 2014, at 1:18 PM, Evans Ye <[email protected]> wrote:
> 
> Please allow me to chime in.
> Here's a real story brings another aspect that we probably can head toward.
> My company recently is going to upgrade our hadoop version. Here we got
> several distribution on the table to be chosen: CDH, HDP, BigTop. But it is
> a
> fact that BigTop is unlikely to be the one for decision maker to choose
> because
> its version set is a little bit too old compared to the others.
> My point is maybe one thing we can do is to release often, release faster.
> And there're things we can do to achieve this goal:
> 1) shrink the supported components and focus on the vital parts
>    (as this thread already mentioned)
> 2) establish an comprehensive auto testing system(like smoke tests) that
>    supports us to do fast release
> 3) Instead of cutting off components, maybe we can upgrade major components
>    more often than minors. For example in 0.8.1, we can only have Hadoop,
>    Spark,HBase,Kafka,Solr upgraded. After all, I believe most of the
>    companies running Hadoop cluster with limited core components.
>    Hence the lag of minors might not be a big problem.
> 
> The quick release cycle not only makes BigTop more attractive but also
> gives the community a vivid image. I think that is the crucial part for
> community to keep growing.
> 
> 2014-12-09 4:23 GMT+08:00 Konstantin Boudnik <[email protected]>:
>> 
>> First I want to address the RJ's question:
>> 
>> The most prominent downstream Bigtop Dependency would be any commercial
>> Hadoop distribution like HDP and CDH. The former is trying to
>> disguise their affiliation by pushing Ambari forward, and Cloudera's
>> seemingly
>> shifting her focus to compressed tarballs media (aka parcels) which
>> requires
>> a closed-source solutions like Cloudera Manager to deploy and control your
>> cluster, effectively rendering it useless if you ever decide to uninstall
>> the
>> control software. In the interest of full disclosure, I don't think parcels
>> have any chance to landslide the consensus in the industry from Linux
>> packaging towards something so obscure and proprietary as parcels are.
>> 
>> 
>> And now to my actual points....:
>> 
>> I do strongly believe the Bigtop was and is the only completely
>> transparent,
>> vendors' friendly, and 100% sticking to official ASF product releases way
>> of
>> building your stack from ground up, deploying and controlling it anyway you
>> want to. I agree with Roman's presentation on how this project can move
>> forward. However, I somewhat disagree with his view on the perspectives. It
>> might be a hard road to drive the opinion of the community.  But, it is a
>> high
>> road.
>> 
>> We are definitely small and mostly unsupported by commercial groups that
>> are
>> using the framework. Being a box of LEGO won't win us anything. If
>> anything,
>> the empirical evidences are against it as commercial distros have decided
>> to
>> move towards their own means of "vendor lock-in" (yes, you hear me
>> right - that's exactly what I said: all so called open-source companies
>> have
>> invented a way to lock-in their customers either with fancy "enterprise
>> features" that aren't adding but amending underlying stack; or with custom
>> set
>> of patches oftentimes rendering the cluster to become incompatible between
>> different vendors).
>> 
>> By all means, my money are on the second way, yet slightly modified (as
>> use-cases are coming from users, not developers):
>>  #2 start driving adoption of software stacks for the particular kind of
>> data workloads
>> 
>> This community has enough day-to-day practitioners on board to
>> accumulate a near-complete introspection of where the technology is moving.
>> And instead of wobbling in a backwash, let's see if we can be smart and
>> define
>> this landscape. After all, Bigtop has adopted Spark well before any of the
>> commercials have officially accepted it. We seemingly are moving more and
>> more into in-memory realm of data processing: Apache Ignite (Gridgain),
>> Tachyon, Spark. I don't know how much legs Hive got in it, but I am
>> doubtful,
>> that it can walk for much longer... May be it's just me.
>> 
>> In this thread http://is.gd/MV2BH9 we already discussed some of the
>> aspects
>> influencing the feature of this project. And we are de-facto working on the
>> implementation. In my opinion, Hadoop has been more or less commoditized
>> already. And it isn't a bad thing, but it means that the innovations are
>> elsewhere. E.g. Spark moving is moving beyond its ties with storage layer
>> via
>> Tachyon abstraction; GridGain simply doesn't care what's underlying storage
>> is. However, data needs to be stored somewhere before it can be processed.
>> And
>> HCFS seems to be fitting the bill ok. But, as I said already, I see the
>> real
>> action elsewhere. If I were to define the shape of our mid- to long'ish
>> term
>> roadmap it'd be something like that:
>> 
>>            ^   Dashboard/Visualization  ^
>>            |     OLTP/ML processing     |
>>            |    Caching/Acceleration    |
>>            |         Storage            |
>> 
>> And around this we can add/improve on deployment (R8???),
>> virtualization/containers/clouds.  In other words - let's focus on the
>> vertical part of the stack, instead of simply supporting the status quo.
>> 
>> Does Cassandra fits the Storage layer in that model? I don't know and most
>> important - I don't care. If there's an interest and manpower to have
>> Cassandra-based stack - sure, but perhaps let's do as a separate branch or
>> something, so we aren't over-complicating things. As Roman said earlier, in
>> this case it'd be great to engage Cassandra/DataStax people into this
>> project.
>> But something tells me they won't be eager to jump on board.
>> 
>> And finally, all this above leads to "how": how we can start reshaping the
>> stack into its next incarnation? Perhaps, Ubuntu model might be an answer
>> for
>> that, but we have discussed that elsewhere and dropped the idea as it
>> wasn't
>> feasible back in the day. Perhaps its time just came?
>> 
>> Apologies for a long post.
>>  Cos
>> 
>> 
>>> On Sun, Dec 07, 2014 at 07:04PM, RJ Nowling wrote:
>>> Which other projects depend on BigTop?  How will the questions about the
>>> direction of BigTop affect those projects?
>>> 
>>> On Sun, Dec 7, 2014 at 6:10 PM, Roman Shaposhnik <[email protected]>
>>> wrote:
>>> 
>>>> Hi!
>>>> 
>>>> On Sat, Dec 6, 2014 at 3:23 PM, jay vyas <[email protected]>
>>>> wrote:
>>>>> hi bigtop !
>>>>> 
>>>>> I thought id start a thread a few vaguely related thoughts i have
>> around
>>>>> next couple iterations of bigtop.
>>>> 
>>>> I think in general I see two major ways for something like
>>>> Bigtop to evolve:
>>>>   #1 remain a 'box of LEGO bricks' with very little opinion on
>>>>        how these pieces need to be integrated
>>>>   #2 start driving oppinioned use-cases for the particular kind of
>>>>        bigdata workloads
>>>> 
>>>> #1 is sort of what all of the Linux distros have been doing for
>>>> the majority of time they existed. #2 is close to what CentOS
>>>> is doing with SIGs.
>>>> 
>>>> Honestly, given the size of our community so far and a total
>>>> lack of corporate backing (with a small exception of Cloudera
>>>> still paying for our EC2 time) I think #1 is all we can do. I'd
>>>> love to be wrong, though.
>>>> 
>>>>> 1) Hive:  How will bigtop to evolve to support it, now that it is
>> much
>>>> more
>>>>> than a mapreduce query wrapper?
>>>> 
>>>> I think Hive will remain a big part of Hadoop workloads for forseeable
>>>> future. What I'd love to see more of is rationalizing things like how
>>>> HCatalog, etc. need to be deployed.
>>>> 
>>>>> 2) I wonder wether we should confirm cassandra interoperability of
>> spark
>>>> in
>>>>> bigtop distros,
>>>> 
>>>> Only if there's a significant interest from cassandra community and
>> even
>>>> then my biggest fear is that with cassandra we're totally changing the
>>>> requirements for the underlying storage subsystem (nothing wrong with
>>>> that, its just that in Hadoop ecosystem everything assumes very
>> HDFS'ish
>>>> requirements for the scale-out storage).
>>>> 
>>>>> 4) in general, i think bigtop can move in one of 3 directions.
>>>>> 
>>>>>  EXPAND ? : Expanding to include new components, with just basic
>>>> interop,
>>>>> and let folks evolve their own stacks on top of bigtop on their own.
>>>>> 
>>>>>  CONTRACT+FOCUS ?  Contracting to focus on a lean set of core
>>>> components,
>>>>> with super high quality.
>>>>> 
>>>>>  STAY THE COURSE ? Staying the same ~ a packaging platform for just
>>>>> hadoop's direct ecosystem.
>>>>> 
>>>>> I am intrigued by the idea of A and B both have clear benefits and
>>>> costs...
>>>>> would like to see the opinions of folks --- do we  lean in one
>> direction
>>>> or
>>>>> another? What is the criteria for adding a new feature, package,
>> stack to
>>>>> bigtop?
>>>>> 
>>>>> ... Or maybe im just overthinking it and should be spending this time
>>>>> testing spark for 0.9 release....
>>>> 
>>>> I'd love to know what other think, but for 0.9 I'd rather stay the
>> course.
>>>> 
>>>> Thanks,
>>>> Roman.
>>>> 
>>>> P.S. There are also market forces at play that may fundamentally change
>>>> the focus of what we're all working on in the year or so.
>>

Re: What will the next generation of bigtop look like?

Reply via email to