It is pretty cool indeed! I wonder how it needs to be structured to be: - easy to access/use from other components wherever it is needed - doesn't interfere with the rest of the stack
I guess one possible way would be to implement the generator as a set of maven artifacts, that could be installed/consumed transparently by just declaring a dependency e.g as proposed via top-level component. Another way is to have a new package like we do for bigtop-utils and such. Perhaps this discussion should be moved to JIRA or shall we continue on the dev@ ?? Cos On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > Hi BigTop, > > I had a discussion with Jay yesterday, we'd like to propose a new component > for BigTop: BigTop Data Generators. > > BigTop Data Generators would consist of a common set of libraries for > building data generators and three example data generators: > > * BigPetStore transaction generator (moved from BigPetStore) > * BigTop Bazaar -- attendee movement and interactions with booths on a > showroom floor, at a conference, or at a mall > * BigTop Weatherman -- stochastic weather simulation (temperature, wind > speed, wind chill, rainfall, etc.) per zip code. (From a model trained on > NOAA historical weather data) > > We believe that creating a common set of libraries will have several > benefits including: > > * Easier for others to build their own data generators > * Make data generators smaller and easier to maintain > * Share improvements across the data generators > > More details on the libraries are below. > > BigPetStore will be continue to focus on building and maintaining > blueprints, powered by the BigTop Data Generators. > > Our vision is that we get all of Apache coming to BigTop for tools for > building better, more comprehensive blueprints. We want to support these > efforts through data generators and the initial set of blueprint we've been > building. > > If the community is generally in support of this, I can create a top-level > "bigtop-data-generators" directory and put the data generators and > libraries in there. > > Thanks! > > RJ > > > ------- > Library details: > > So far, I've extracted the following common libraries: > > * Samplers -- provides classes for PDFs and various samplers > * Name generator -- data set and samplers for generating names > * Location data set -- data set and classes for US zip codes, their > GPS coordinates, median house hold incomes, and population sizes > * Product generator -- library for enumerating products from a > specification file. Comes with default specifications for BigPetStore > > I also expect that I'll add libraries for: > > * Particle simulation -- customer movement in a room > * Latent factor model generation -- generate latent factors and > customer weights to create something like MovieLens data. Used in Bazaar > for booth preferences and potentially in BigPetStore for customer item > preferences > > Most of these libraries came out of the BigPetStore data generator but the > other generators have been refactored to be based off the standard set of > libraries.
