Agreed -- would be good to use data generators in the smoke tests.

With Youngwoo's work on adding Spark smoke tests, I don't think add BPS
Spark will be too hard.  We just need to figure out to make the BPS Spark
jar available for the smoke tests.



On Mon, Aug 24, 2015 at 6:33 AM, Evans Ye <[email protected]> wrote:

> +1.
> And definitly good to have at least one demo case in our smoke test for
> each
> so that people can fully understand what data/format it's generating and
> how to process.
>
> 2015-08-24 12:38 GMT+08:00 김영우 (Youngwoo Kim) <[email protected]>:
>
> > +1
> >
> > I hope Bigtop DG would help all over the Bigtop infra -- blueprints,
> > smokes, benchmarks and etc.
> >
> > Thanks,
> > Youngwoo
> >
> > On Mon, Aug 24, 2015 at 1:53 AM, RJ Nowling <[email protected]> wrote:
> >
> > > Hi BigTop,
> > >
> > > I had a discussion with Jay yesterday, we'd like to propose a new
> > component
> > > for BigTop: BigTop Data Generators.
> > >
> > > BigTop Data Generators would consist of a common set of libraries for
> > > building data generators and three example data generators:
> > >
> > >     * BigPetStore transaction generator (moved from BigPetStore)
> > >     * BigTop Bazaar -- attendee movement and interactions with booths
> on
> > a
> > > showroom floor, at a conference, or at a mall
> > >     * BigTop Weatherman -- stochastic weather simulation (temperature,
> > wind
> > > speed, wind chill, rainfall, etc.) per zip code.  (From a model trained
> > on
> > > NOAA historical weather data)
> > >
> > > We believe that creating a common set of libraries will have several
> > > benefits including:
> > >
> > >      * Easier for others to build their own data generators
> > >      * Make data generators smaller and easier to maintain
> > >      * Share improvements across the data generators
> > >
> > > More details on the libraries are below.
> > >
> > > BigPetStore will be continue to focus on building  and maintaining
> > > blueprints, powered by the BigTop Data Generators.
> > >
> > > Our vision is that we get all of Apache coming to BigTop for tools for
> > > building better, more comprehensive blueprints.  We want to support
> these
> > > efforts through data generators and the initial set of blueprint we've
> > been
> > > building.
> > >
> > > If the community is generally in support of this, I can create a
> > top-level
> > > "bigtop-data-generators" directory and put the data generators and
> > > libraries in there.
> > >
> > > Thanks!
> > >
> > > RJ
> > >
> > >
> > > -------
> > > Library details:
> > >
> > > So far, I've extracted the following common libraries:
> > >
> > >      * Samplers -- provides classes for PDFs and various samplers
> > >      * Name generator -- data set and samplers for generating names
> > >      * Location data set -- data set and classes for US zip codes,
> their
> > > GPS coordinates, median house hold incomes, and population sizes
> > >      * Product generator -- library for enumerating products from a
> > > specification file.  Comes with default specifications for BigPetStore
> > >
> > > I also expect that I'll add libraries for:
> > >
> > >       * Particle simulation -- customer movement in a room
> > >       * Latent factor model generation -- generate latent factors and
> > > customer weights to create something like MovieLens data.  Used in
> Bazaar
> > > for booth preferences and potentially in BigPetStore for customer item
> > > preferences
> > >
> > > Most of these libraries came out of the BigPetStore data generator but
> > the
> > > other generators have been refactored to be based off the standard set
> of
> > > libraries.
> > >
> >
>

Reply via email to