I created a JIRA (BIGTOP-1982) with various subtasks to track the overall
effort.  If someone would be willing to start reviewing my changes, please
start with the pull request for BIGTOP-1983:

https://github.com/apache/bigtop/pull/33

Thanks!

On Mon, Aug 24, 2015 at 8:27 AM, RJ Nowling <[email protected]> wrote:

> Agreed -- would be good to use data generators in the smoke tests.
>
> With Youngwoo's work on adding Spark smoke tests, I don't think add BPS
> Spark will be too hard.  We just need to figure out to make the BPS Spark
> jar available for the smoke tests.
>
>
>
> On Mon, Aug 24, 2015 at 6:33 AM, Evans Ye <[email protected]> wrote:
>
>> +1.
>> And definitly good to have at least one demo case in our smoke test for
>> each
>> so that people can fully understand what data/format it's generating and
>> how to process.
>>
>> 2015-08-24 12:38 GMT+08:00 김영우 (Youngwoo Kim) <[email protected]>:
>>
>> > +1
>> >
>> > I hope Bigtop DG would help all over the Bigtop infra -- blueprints,
>> > smokes, benchmarks and etc.
>> >
>> > Thanks,
>> > Youngwoo
>> >
>> > On Mon, Aug 24, 2015 at 1:53 AM, RJ Nowling <[email protected]> wrote:
>> >
>> > > Hi BigTop,
>> > >
>> > > I had a discussion with Jay yesterday, we'd like to propose a new
>> > component
>> > > for BigTop: BigTop Data Generators.
>> > >
>> > > BigTop Data Generators would consist of a common set of libraries for
>> > > building data generators and three example data generators:
>> > >
>> > >     * BigPetStore transaction generator (moved from BigPetStore)
>> > >     * BigTop Bazaar -- attendee movement and interactions with booths
>> on
>> > a
>> > > showroom floor, at a conference, or at a mall
>> > >     * BigTop Weatherman -- stochastic weather simulation (temperature,
>> > wind
>> > > speed, wind chill, rainfall, etc.) per zip code.  (From a model
>> trained
>> > on
>> > > NOAA historical weather data)
>> > >
>> > > We believe that creating a common set of libraries will have several
>> > > benefits including:
>> > >
>> > >      * Easier for others to build their own data generators
>> > >      * Make data generators smaller and easier to maintain
>> > >      * Share improvements across the data generators
>> > >
>> > > More details on the libraries are below.
>> > >
>> > > BigPetStore will be continue to focus on building  and maintaining
>> > > blueprints, powered by the BigTop Data Generators.
>> > >
>> > > Our vision is that we get all of Apache coming to BigTop for tools for
>> > > building better, more comprehensive blueprints.  We want to support
>> these
>> > > efforts through data generators and the initial set of blueprint we've
>> > been
>> > > building.
>> > >
>> > > If the community is generally in support of this, I can create a
>> > top-level
>> > > "bigtop-data-generators" directory and put the data generators and
>> > > libraries in there.
>> > >
>> > > Thanks!
>> > >
>> > > RJ
>> > >
>> > >
>> > > -------
>> > > Library details:
>> > >
>> > > So far, I've extracted the following common libraries:
>> > >
>> > >      * Samplers -- provides classes for PDFs and various samplers
>> > >      * Name generator -- data set and samplers for generating names
>> > >      * Location data set -- data set and classes for US zip codes,
>> their
>> > > GPS coordinates, median house hold incomes, and population sizes
>> > >      * Product generator -- library for enumerating products from a
>> > > specification file.  Comes with default specifications for BigPetStore
>> > >
>> > > I also expect that I'll add libraries for:
>> > >
>> > >       * Particle simulation -- customer movement in a room
>> > >       * Latent factor model generation -- generate latent factors and
>> > > customer weights to create something like MovieLens data.  Used in
>> Bazaar
>> > > for booth preferences and potentially in BigPetStore for customer item
>> > > preferences
>> > >
>> > > Most of these libraries came out of the BigPetStore data generator but
>> > the
>> > > other generators have been refactored to be based off the standard
>> set of
>> > > libraries.
>> > >
>> >
>>
>
>

Reply via email to