Re: Hosting Data Generators in BigTop

RJ Nowling Mon, 30 Mar 2015 17:08:31 -0700

My current model simulates the conference attendees as particles and reports 
their X,Y positions at specified intervals. We could modify the output to 
compute distances to scanners.


I currently have half the model implemented in Golang. I could rewrite in a 
Java or another JVM language, commit to BigTop, and continue development 
through BigTop so BigTop can track my progress. 

Andrew, would you be willing to talk on the phone or via Google Hangouts to 
work out details for a plan on integration with HBase / Phoenix?



> On Mar 30, 2015, at 5:07 PM, Andrew Purtell <[email protected]> wrote:
> 
> For IoT, some low hanging fruit is a sensor network use case. The
> particulars of the use case can vary but I can see stressing HBase on the
> write side by deploying sensors over a simulated 2-dimensional space,
> keying in part by location, and then having telemetry timeseries data
> arrive by time and location in irregular patterns. (Sensors would only
> report changes. The generator could model duty cycles in addition to
> modeling the physical process under measurement.) We could scale up and
> down the data bulk and arrival rate by varying the size of the simulated
> space and the rate of measurement change notices produced by the model. On
> the read side having compound keys with geolocation in the leading edge,
> followed by a time component, would be natural for interactive
> visualization of the data as heat maps. They could be animated or
> summarized over varying time ranges. This would produce short and long
> scanning access patterns with wide variation in selectivity of server side
> filtering depending on query. If using Phoenix, it would parallelize the
> scanning activity and put load through the roof.
> 
> 
> On Fri, Mar 27, 2015 at 11:58 AM, jay vyas <[email protected]>
> wrote:
> 
>> Definetely will be awesome if andrew can help us craft an idiomatic and
>> meaningfull way to stress HBase at scale w/ iot data
>> 
>>> On Fri, Mar 27, 2015 at 2:48 PM, RJ Nowling <[email protected]> wrote:
>>> 
>>> Jay and Andrew, thanks for the feedback!.  I'd be happy to discuss ways
>> to
>>> connect BigTop Bazaar to HBase.
>>> 
>>> It would be great to work with the BigBench project to see if our data
>>> generators would be of interest.
>>> 
>>> On Fri, Mar 27, 2015 at 1:17 PM, Andrew Purtell <[email protected]>
>>> wrote:
>>> 
>>>> I agree the proposal sounds very interesting.
>>>> 
>>>> I can also help with the HBase side of things.
>>>> 
>>>> On the general subject of data generators, you may want to reach out to
>>> the
>>>> people behind the "BigBench" project (
>>>> https://github.com/intel-hadoop/Big-Bench). These are ex colleagues of
>>>> mine
>>>> from Intel. When I was there they were interested in contributing to
>>>> Apache, but had significant problems in that the data generator itself
>>> was
>>>> licensed under non-free terms incompatible with the ASL. I think they
>>>> wanted to move past that but weren't sure exactly how (including having
>>> the
>>>> bandwidth to do so). I see occasional updates to the repo so they are
>>> still
>>>> active in some way.
>>>> 
>>>> 
>>>> 
>>>> On Fri, Mar 27, 2015 at 6:42 AM, jay vyas <[email protected]
>>> 
>>>> wrote:
>>>> 
>>>>> Thanks for proposing rj.
>>>>> 
>>>>> Im in favor, so long as it comes w/ a bigtop supported use case, and
>>>> indeed
>>>>> BigTop bazaar is a lovely use case for hbase !
>>>>> 
>>>>> I'm happy help you with the HBase side of things, maybe andrew can
>>>>> collaborate on a reference architecture with us for scale testing of
>>>> hbase
>>>>> via bigtop bazaar's realtime IoT style of data generation.
>>>>> 
>>>>> That will be a great blueprint compleiment to the mapreduce, spark,
>>>>> blueprints which we already have.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Mar 26, 2015 at 4:22 PM, RJ Nowling <[email protected]>
>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Most of you are aware of my work with Jay on BigPetStore,
>>> particularly
>>>>> the
>>>>>> data generator and Spark pipelines.  Data generators are a great
>> way
>>> to
>>>>>> load test systems, as Jay has recently done for kubernetes using
>> the
>>>> BPS
>>>>>> data generator.
>>>>>> 
>>>>>> We think they're generally useful to the big data community. Would
>>>> BigTop
>>>>>> be interested in hosting these data generator / load testing tools
>> as
>>>>>> released artifacts in their own right?
>>>>>> 
>>>>>> For example, we'd like to set up a web page on the BigTop site with
>>>> links
>>>>>> to:
>>>>>> 
>>>>>> * BPS Data Generator
>>>>>> * BPS Spark
>>>>>> * BPS Transaction Queue for using the data generator to test
>>> streaming
>>>>>> services
>>>>>> 
>>>>>> and we'd like to release these as source tarballs, uber JARs,
>>>>> Maven-hosted
>>>>>> JARs, and Docker containers (as appropriate).
>>>>>> 
>>>>>> Would this be okay or should everything be released as part of
>> BigTop
>>>>>> itself?
>>>>>> 
>>>>>> Secondly, I've been working on a model for simulating customer
>>>> movements
>>>>> at
>>>>>> a conference.  It's designed for development and testing for a
>>>> real-time
>>>>>> streaming analytics application where we didn't have access to data
>>>> ahead
>>>>>> of time.  You can read about it here:
>>>>>> 
>>>>>> http://rnowling.github.io/math/2015/03/24/bigtop-bazaar-model.html
>>>>>> 
>>>>>> I'd like to call it "BigTop Bazaar" and release it through BigTop.
>>> Is
>>>>> the
>>>>>> BigTop community interested in having multiple data generators?
>>>>>> 
>>>>>> Thanks,
>>>>>> RJ
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> jay vyas
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>>   - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>> (via Tom White)
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> jay vyas
>> 
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: Hosting Data Generators in BigTop

Reply via email to