I agree the proposal sounds very interesting. I can also help with the HBase side of things.
On the general subject of data generators, you may want to reach out to the people behind the "BigBench" project ( https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of mine from Intel. When I was there they were interested in contributing to Apache, but had significant problems in that the data generator itself was licensed under non-free terms incompatible with the ASL. I think they wanted to move past that but weren't sure exactly how (including having the bandwidth to do so). I see occasional updates to the repo so they are still active in some way. On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <[email protected]> wrote: > Thanks for proposing rj. > > Im in favor, so long as it comes w/ a bigtop supported use case, and indeed > BigTop bazaar is a lovely use case for hbase ! > > I'm happy help you with the HBase side of things, maybe andrew can > collaborate on a reference architecture with us for scale testing of hbase > via bigtop bazaar's realtime IoT style of data generation. > > That will be a great blueprint compleiment to the mapreduce, spark, > blueprints which we already have. > > > > On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <[email protected]> wrote: > > > Hi all, > > > > Most of you are aware of my work with Jay on BigPetStore, particularly > the > > data generator and Spark pipelines. Data generators are a great way to > > load test systems, as Jay has recently done for kubernetes using the BPS > > data generator. > > > > We think they're generally useful to the big data community. Would BigTop > > be interested in hosting these data generator / load testing tools as > > released artifacts in their own right? > > > > For example, we'd like to set up a web page on the BigTop site with links > > to: > > > > * BPS Data Generator > > * BPS Spark > > * BPS Transaction Queue for using the data generator to test streaming > > services > > > > and we'd like to release these as source tarballs, uber JARs, > Maven-hosted > > JARs, and Docker containers (as appropriate). > > > > Would this be okay or should everything be released as part of BigTop > > itself? > > > > Secondly, I've been working on a model for simulating customer movements > at > > a conference. It's designed for development and testing for a real-time > > streaming analytics application where we didn't have access to data ahead > > of time. You can read about it here: > > > > http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html > > > > I'd like to call it "BigTop Bazaar" and release it through BigTop. Is > the > > BigTop community interested in having multiple data generators? > > > > Thanks, > > RJ > > > > > > -- > jay vyas > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
