I created a JIRA (BIGTOP-1982) with various subtasks to track the overall effort. If someone would be willing to start reviewing my changes, please start with the pull request for BIGTOP-1983:
https://github.com/apache/bigtop/pull/33 Thanks! On Mon, Aug 24, 2015 at 8:27 AM, RJ Nowling <[email protected]> wrote: > Agreed -- would be good to use data generators in the smoke tests. > > With Youngwoo's work on adding Spark smoke tests, I don't think add BPS > Spark will be too hard. We just need to figure out to make the BPS Spark > jar available for the smoke tests. > > > > On Mon, Aug 24, 2015 at 6:33 AM, Evans Ye <[email protected]> wrote: > >> +1. >> And definitly good to have at least one demo case in our smoke test for >> each >> so that people can fully understand what data/format it's generating and >> how to process. >> >> 2015-08-24 12:38 GMT+08:00 김영우 (Youngwoo Kim) <[email protected]>: >> >> > +1 >> > >> > I hope Bigtop DG would help all over the Bigtop infra -- blueprints, >> > smokes, benchmarks and etc. >> > >> > Thanks, >> > Youngwoo >> > >> > On Mon, Aug 24, 2015 at 1:53 AM, RJ Nowling <[email protected]> wrote: >> > >> > > Hi BigTop, >> > > >> > > I had a discussion with Jay yesterday, we'd like to propose a new >> > component >> > > for BigTop: BigTop Data Generators. >> > > >> > > BigTop Data Generators would consist of a common set of libraries for >> > > building data generators and three example data generators: >> > > >> > > * BigPetStore transaction generator (moved from BigPetStore) >> > > * BigTop Bazaar -- attendee movement and interactions with booths >> on >> > a >> > > showroom floor, at a conference, or at a mall >> > > * BigTop Weatherman -- stochastic weather simulation (temperature, >> > wind >> > > speed, wind chill, rainfall, etc.) per zip code. (From a model >> trained >> > on >> > > NOAA historical weather data) >> > > >> > > We believe that creating a common set of libraries will have several >> > > benefits including: >> > > >> > > * Easier for others to build their own data generators >> > > * Make data generators smaller and easier to maintain >> > > * Share improvements across the data generators >> > > >> > > More details on the libraries are below. >> > > >> > > BigPetStore will be continue to focus on building and maintaining >> > > blueprints, powered by the BigTop Data Generators. >> > > >> > > Our vision is that we get all of Apache coming to BigTop for tools for >> > > building better, more comprehensive blueprints. We want to support >> these >> > > efforts through data generators and the initial set of blueprint we've >> > been >> > > building. >> > > >> > > If the community is generally in support of this, I can create a >> > top-level >> > > "bigtop-data-generators" directory and put the data generators and >> > > libraries in there. >> > > >> > > Thanks! >> > > >> > > RJ >> > > >> > > >> > > ------- >> > > Library details: >> > > >> > > So far, I've extracted the following common libraries: >> > > >> > > * Samplers -- provides classes for PDFs and various samplers >> > > * Name generator -- data set and samplers for generating names >> > > * Location data set -- data set and classes for US zip codes, >> their >> > > GPS coordinates, median house hold incomes, and population sizes >> > > * Product generator -- library for enumerating products from a >> > > specification file. Comes with default specifications for BigPetStore >> > > >> > > I also expect that I'll add libraries for: >> > > >> > > * Particle simulation -- customer movement in a room >> > > * Latent factor model generation -- generate latent factors and >> > > customer weights to create something like MovieLens data. Used in >> Bazaar >> > > for booth preferences and potentially in BigPetStore for customer item >> > > preferences >> > > >> > > Most of these libraries came out of the BigPetStore data generator but >> > the >> > > other generators have been refactored to be based off the standard >> set of >> > > libraries. >> > > >> > >> > >
