I will be juggling several projects beginning of Jan, and can only arrange hosting (and attend a meetup) starting the last week of Jan (1/25 and on). I also notice that Jay suggested "a meetup after January". Anyone for the last week of Jan?
On Tue, Dec 9, 2014 at 11:39 PM, Konstantin Boudnik <[email protected]> wrote: > On Mon, Dec 08, 2014 at 09:16PM, Konstantin Boudnik wrote: > > On Mon, Dec 08, 2014 at 11:57PM, Jay Vyas wrote: > > > "Let's see if we can be smart and define the landscape" > > > > > > Well put @cos...I think Romans point was that it would be hard, not > that it > > > would be bad. And I think you're both right : it's hard? Yes. But > > > worthwhile... Possibly? Next step we will all have to get in a room and > > > think about this face to face. > > > > > > Let's shoot for a meetup after january in California... Where we can > plan > > > the future direction of bigtop. In the meanwhile hope to hear more > opinions > > > on this. > > > > +1 I can host at WANdisco or perhaps there other options? > > Shall we start putting some arrangements/planning for the January > meetup? Say, 2nd week or January? Or the following one? > > Andre, do you guys want to host it at A9? Anyone else? I am happy to do > this > at my office, but it might be a bit of travel, although against the traffic > both ways. > > Cos > > > > > On Dec 8, 2014, at 3:23 PM, Konstantin Boudnik <[email protected]> > wrote: > > > > > > > > First I want to address the RJ's question: > > > > > > > > The most prominent downstream Bigtop Dependency would be any > commercial > > > > Hadoop distribution like HDP and CDH. The former is trying to > > > > disguise their affiliation by pushing Ambari forward, and Cloudera's > seemingly > > > > shifting her focus to compressed tarballs media (aka parcels) which > requires > > > > a closed-source solutions like Cloudera Manager to deploy and > control your > > > > cluster, effectively rendering it useless if you ever decide to > uninstall the > > > > control software. In the interest of full disclosure, I don't think > parcels > > > > have any chance to landslide the consensus in the industry from Linux > > > > packaging towards something so obscure and proprietary as parcels > are. > > > > > > > > > > > > And now to my actual points....: > > > > > > > > I do strongly believe the Bigtop was and is the only completely > transparent, > > > > vendors' friendly, and 100% sticking to official ASF product > releases way of > > > > building your stack from ground up, deploying and controlling it > anyway you > > > > want to. I agree with Roman's presentation on how this project can > move > > > > forward. However, I somewhat disagree with his view on the > perspectives. It > > > > might be a hard road to drive the opinion of the community. But, it > is a high > > > > road. > > > > > > > > We are definitely small and mostly unsupported by commercial groups > that are > > > > using the framework. Being a box of LEGO won't win us anything. If > anything, > > > > the empirical evidences are against it as commercial distros have > decided to > > > > move towards their own means of "vendor lock-in" (yes, you hear me > > > > right - that's exactly what I said: all so called open-source > companies have > > > > invented a way to lock-in their customers either with fancy > "enterprise > > > > features" that aren't adding but amending underlying stack; or with > custom set > > > > of patches oftentimes rendering the cluster to become incompatible > between > > > > different vendors). > > > > > > > > By all means, my money are on the second way, yet slightly modified > (as > > > > use-cases are coming from users, not developers): > > > > #2 start driving adoption of software stacks for the particular > kind of data workloads > > > > > > > > This community has enough day-to-day practitioners on board to > > > > accumulate a near-complete introspection of where the technology is > moving. > > > > And instead of wobbling in a backwash, let's see if we can be smart > and define > > > > this landscape. After all, Bigtop has adopted Spark well before any > of the > > > > commercials have officially accepted it. We seemingly are moving > more and > > > > more into in-memory realm of data processing: Apache Ignite > (Gridgain), > > > > Tachyon, Spark. I don't know how much legs Hive got in it, but I am > doubtful, > > > > that it can walk for much longer... May be it's just me. > > > > > > > > In this thread http://is.gd/MV2BH9 we already discussed some of the > aspects > > > > influencing the feature of this project. And we are de-facto working > on the > > > > implementation. In my opinion, Hadoop has been more or less > commoditized > > > > already. And it isn't a bad thing, but it means that the innovations > are > > > > elsewhere. E.g. Spark moving is moving beyond its ties with storage > layer via > > > > Tachyon abstraction; GridGain simply doesn't care what's underlying > storage > > > > is. However, data needs to be stored somewhere before it can be > processed. And > > > > HCFS seems to be fitting the bill ok. But, as I said already, I see > the real > > > > action elsewhere. If I were to define the shape of our mid- to > long'ish term > > > > roadmap it'd be something like that: > > > > > > > > ^ Dashboard/Visualization ^ > > > > | OLTP/ML processing | > > > > | Caching/Acceleration | > > > > | Storage | > > > > > > > > And around this we can add/improve on deployment (R8???), > > > > virtualization/containers/clouds. In other words - let's focus on > the > > > > vertical part of the stack, instead of simply supporting the status > quo. > > > > > > > > Does Cassandra fits the Storage layer in that model? I don't know > and most > > > > important - I don't care. If there's an interest and manpower to have > > > > Cassandra-based stack - sure, but perhaps let's do as a separate > branch or > > > > something, so we aren't over-complicating things. As Roman said > earlier, in > > > > this case it'd be great to engage Cassandra/DataStax people into > this project. > > > > But something tells me they won't be eager to jump on board. > > > > > > > > And finally, all this above leads to "how": how we can start > reshaping the > > > > stack into its next incarnation? Perhaps, Ubuntu model might be an > answer for > > > > that, but we have discussed that elsewhere and dropped the idea as > it wasn't > > > > feasible back in the day. Perhaps its time just came? > > > > > > > > Apologies for a long post. > > > > Cos > > > > > > > > > > > >> On Sun, Dec 07, 2014 at 07:04PM, RJ Nowling wrote: > > > >> Which other projects depend on BigTop? How will the questions > about the > > > >> direction of BigTop affect those projects? > > > >> > > > >> On Sun, Dec 7, 2014 at 6:10 PM, Roman Shaposhnik < > [email protected]> > > > >> wrote: > > > >> > > > >>> Hi! > > > >>> > > > >>> On Sat, Dec 6, 2014 at 3:23 PM, jay vyas < > [email protected]> > > > >>> wrote: > > > >>>> hi bigtop ! > > > >>>> > > > >>>> I thought id start a thread a few vaguely related thoughts i have > around > > > >>>> next couple iterations of bigtop. > > > >>> > > > >>> I think in general I see two major ways for something like > > > >>> Bigtop to evolve: > > > >>> #1 remain a 'box of LEGO bricks' with very little opinion on > > > >>> how these pieces need to be integrated > > > >>> #2 start driving oppinioned use-cases for the particular kind of > > > >>> bigdata workloads > > > >>> > > > >>> #1 is sort of what all of the Linux distros have been doing for > > > >>> the majority of time they existed. #2 is close to what CentOS > > > >>> is doing with SIGs. > > > >>> > > > >>> Honestly, given the size of our community so far and a total > > > >>> lack of corporate backing (with a small exception of Cloudera > > > >>> still paying for our EC2 time) I think #1 is all we can do. I'd > > > >>> love to be wrong, though. > > > >>> > > > >>>> 1) Hive: How will bigtop to evolve to support it, now that it is > much > > > >>> more > > > >>>> than a mapreduce query wrapper? > > > >>> > > > >>> I think Hive will remain a big part of Hadoop workloads for > forseeable > > > >>> future. What I'd love to see more of is rationalizing things like > how > > > >>> HCatalog, etc. need to be deployed. > > > >>> > > > >>>> 2) I wonder wether we should confirm cassandra interoperability > of spark > > > >>> in > > > >>>> bigtop distros, > > > >>> > > > >>> Only if there's a significant interest from cassandra community > and even > > > >>> then my biggest fear is that with cassandra we're totally changing > the > > > >>> requirements for the underlying storage subsystem (nothing wrong > with > > > >>> that, its just that in Hadoop ecosystem everything assumes very > HDFS'ish > > > >>> requirements for the scale-out storage). > > > >>> > > > >>>> 4) in general, i think bigtop can move in one of 3 directions. > > > >>>> > > > >>>> EXPAND ? : Expanding to include new components, with just basic > > > >>> interop, > > > >>>> and let folks evolve their own stacks on top of bigtop on their > own. > > > >>>> > > > >>>> CONTRACT+FOCUS ? Contracting to focus on a lean set of core > > > >>> components, > > > >>>> with super high quality. > > > >>>> > > > >>>> STAY THE COURSE ? Staying the same ~ a packaging platform for > just > > > >>>> hadoop's direct ecosystem. > > > >>>> > > > >>>> I am intrigued by the idea of A and B both have clear benefits and > > > >>> costs... > > > >>>> would like to see the opinions of folks --- do we lean in one > direction > > > >>> or > > > >>>> another? What is the criteria for adding a new feature, package, > stack to > > > >>>> bigtop? > > > >>>> > > > >>>> ... Or maybe im just overthinking it and should be spending this > time > > > >>>> testing spark for 0.9 release.... > > > >>> > > > >>> I'd love to know what other think, but for 0.9 I'd rather stay the > course. > > > >>> > > > >>> Thanks, > > > >>> Roman. > > > >>> > > > >>> P.S. There are also market forces at play that may fundamentally > change > > > >>> the focus of what we're all working on in the year or so. > > > >>> > > >
