Sure, we can take this to a JIRA and discuss by hangout. Maybe we can set
up something one day next week in the afternoon in the Pacific timezone?

Implementing a data generator in Go will be an integration challenge with
most of the Apache ecosystem if the generator is supposed to interact
directly with the system(s) under test. Given this is "big data" I don't
think we want to generate into an intermediate place, i.e. really big
files, and then replay those with some other Java-native utility. Therefore
you'll have to use GoJVM or something similar to interface with the client
Java code via JNI. JNI interactions aren't efficient unless the client API
of whatever component you are generating data for can accept data via
direct mapped buffers, and most don't. (Maybe none of them?) This will
limit the peak throughput of the generator.

Of course it would be great if someone builds a Go client for Apache $FOO
and contributes it. To do it right (which, in my opinion, avoids JNI), this
would involve reverse engineering wire formats and going up from there,
like what asynchbase (https://github.com/OpenTSDB/asynchbase) did for HBase
RPC. I don't expect this will happen any time soon but who knows, this is
open source.


On Mon, Mar 30, 2015 at 6:14 PM, jay vyas <[email protected]>
wrote:

> also guys shall we carry these guys  on in the JIRA
> https://issues.apache.org/jira/browse/BIGTOP-1782  ?
>  hangout conversation is a great idea if andy has time :)
>
> On Mon, Mar 30, 2015 at 8:05 PM, RJ Nowling <[email protected]> wrote:
>
> > My current model simulates the conference attendees as particles and
> > reports their X,Y positions at specified intervals. We could modify the
> > output to compute distances to scanners.
> >
> > I currently have half the model implemented in Golang. I could rewrite in
> > a Java or another JVM language, commit to BigTop, and continue
> development
> > through BigTop so BigTop can track my progress.
> >
> > Andrew, would you be willing to talk on the phone or via Google Hangouts
> > to work out details for a plan on integration with HBase / Phoenix?
> >
> >
> >
> > > On Mar 30, 2015, at 5:07 PM, Andrew Purtell <[email protected]>
> wrote:
> > >
> > > For IoT, some low hanging fruit is a sensor network use case. The
> > > particulars of the use case can vary but I can see stressing HBase on
> the
> > > write side by deploying sensors over a simulated 2-dimensional space,
> > > keying in part by location, and then having telemetry timeseries data
> > > arrive by time and location in irregular patterns. (Sensors would only
> > > report changes. The generator could model duty cycles in addition to
> > > modeling the physical process under measurement.) We could scale up and
> > > down the data bulk and arrival rate by varying the size of the
> simulated
> > > space and the rate of measurement change notices produced by the model.
> > On
> > > the read side having compound keys with geolocation in the leading
> edge,
> > > followed by a time component, would be natural for interactive
> > > visualization of the data as heat maps. They could be animated or
> > > summarized over varying time ranges. This would produce short and long
> > > scanning access patterns with wide variation in selectivity of server
> > side
> > > filtering depending on query. If using Phoenix, it would parallelize
> the
> > > scanning activity and put load through the roof.
> > >
> > >
> > > On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <
> [email protected]>
> > > wrote:
> > >
> > >> Definetely will be awesome if andrew can help us craft an idiomatic
> and
> > >> meaningfull way to stress HBase at scale w/ iot data
> > >>
> > >>> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <[email protected]>
> > wrote:
> > >>>
> > >>> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss
> ways
> > >> to
> > >>> connect BigTop Bazaar to HBase.
> > >>>
> > >>> It would be great to work with the BigBench project to see if our
> data
> > >>> generators would be of interest.
> > >>>
> > >>> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <[email protected]
> >
> > >>> wrote:
> > >>>
> > >>>> I agree the proposal sounds very interesting.
> > >>>>
> > >>>> I can also help with the HBase side of things.
> > >>>>
> > >>>> On the general subject of data generators, you may want to reach out
> > to
> > >>> the
> > >>>> people behind the "BigBench" project (
> > >>>> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues
> > of
> > >>>> mine
> > >>>> from Intel. When I was there they were interested in contributing to
> > >>>> Apache, but had significant problems in that the data generator
> itself
> > >>> was
> > >>>> licensed under non-free terms incompatible with the ASL. I think
> they
> > >>>> wanted to move past that but weren't sure exactly how (including
> > having
> > >>> the
> > >>>> bandwidth to do so). I see occasional updates to the repo so they
> are
> > >>> still
> > >>>> active in some way.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <
> > [email protected]
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for proposing rj.
> > >>>>>
> > >>>>> Im in favor, so long as it comes w/ a bigtop supported use case,
> and
> > >>>> indeed
> > >>>>> BigTop bazaar is a lovely use case for hbase !
> > >>>>>
> > >>>>> I'm happy help you with the HBase side of things, maybe andrew can
> > >>>>> collaborate on a reference architecture with us for scale testing
> of
> > >>>> hbase
> > >>>>> via bigtop bazaar's realtime IoT style of data generation.
> > >>>>>
> > >>>>> That will be a great blueprint compleiment to the mapreduce, spark,
> > >>>>> blueprints which we already have.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <[email protected]>
> > >>> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> Most of you are aware of my work with Jay on BigPetStore,
> > >>> particularly
> > >>>>> the
> > >>>>>> data generator and Spark pipelines.  Data generators are a great
> > >> way
> > >>> to
> > >>>>>> load test systems, as Jay has recently done for kubernetes using
> > >> the
> > >>>> BPS
> > >>>>>> data generator.
> > >>>>>>
> > >>>>>> We think they're generally useful to the big data community. Would
> > >>>> BigTop
> > >>>>>> be interested in hosting these data generator / load testing tools
> > >> as
> > >>>>>> released artifacts in their own right?
> > >>>>>>
> > >>>>>> For example, we'd like to set up a web page on the BigTop site
> with
> > >>>> links
> > >>>>>> to:
> > >>>>>>
> > >>>>>> * BPS Data Generator
> > >>>>>> * BPS Spark
> > >>>>>> * BPS Transaction Queue for using the data generator to test
> > >>> streaming
> > >>>>>> services
> > >>>>>>
> > >>>>>> and we'd like to release these as source tarballs, uber JARs,
> > >>>>> Maven-hosted
> > >>>>>> JARs, and Docker containers (as appropriate).
> > >>>>>>
> > >>>>>> Would this be okay or should everything be released as part of
> > >> BigTop
> > >>>>>> itself?
> > >>>>>>
> > >>>>>> Secondly, I've been working on a model for simulating customer
> > >>>> movements
> > >>>>> at
> > >>>>>> a conference.  It's designed for development and testing for a
> > >>>> real-time
> > >>>>>> streaming analytics application where we didn't have access to
> data
> > >>>> ahead
> > >>>>>> of time.  You can read about it here:
> > >>>>>>
> > >>>>>>
> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
> > >>>>>>
> > >>>>>> I'd like to call it "BigTop Bazaar" and release it through BigTop.
> > >>> Is
> > >>>>> the
> > >>>>>> BigTop community interested in having multiple data generators?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> RJ
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> jay vyas
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>
> > >>>>   - Andy
> > >>>>
> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> > >> Hein
> > >>>> (via Tom White)
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> jay vyas
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >   - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> >
>
>
>
> --
> jay vyas
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to