On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: > Hi, > > Nive to have data generators in Bigtop. > > But please do not include it in bigtop_utils, since this package is > mandatory. Not everyone needs a data generator .
Yup. And let's move further design discussion to the JIRA! > Olaf > > > > Am 26.08.2015 um 11:25 schrieb Jay Vyas <[email protected]>: > > > > Publishing the jar to bigtops maven is probably a good first step ,Then > > apps can just include it as needed...?. > > > > I'm not against packaging if someone wants packages for this. Maybe even > > include it in bigtop util ? > > > > Let's move to jira, > > > >> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik <[email protected]> wrote: > >> > >> It is pretty cool indeed! > >> > >> I wonder how it needs to be structured to be: > >> - easy to access/use from other components wherever it is needed > >> - doesn't interfere with the rest of the stack > >> > >> I guess one possible way would be to implement the generator as a set of > >> maven > >> artifacts, that could be installed/consumed transparently by just > >> declaring a > >> dependency e.g as proposed via top-level component. > >> > >> Another way is to have a new package like we do for bigtop-utils and such. > >> > >> Perhaps this discussion should be moved to JIRA or shall we continue on the > >> dev@ ?? > >> > >> Cos > >> > >>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > >>> Hi BigTop, > >>> > >>> I had a discussion with Jay yesterday, we'd like to propose a new > >>> component > >>> for BigTop: BigTop Data Generators. > >>> > >>> BigTop Data Generators would consist of a common set of libraries for > >>> building data generators and three example data generators: > >>> > >>> * BigPetStore transaction generator (moved from BigPetStore) > >>> * BigTop Bazaar -- attendee movement and interactions with booths on a > >>> showroom floor, at a conference, or at a mall > >>> * BigTop Weatherman -- stochastic weather simulation (temperature, wind > >>> speed, wind chill, rainfall, etc.) per zip code. (From a model trained on > >>> NOAA historical weather data) > >>> > >>> We believe that creating a common set of libraries will have several > >>> benefits including: > >>> > >>> * Easier for others to build their own data generators > >>> * Make data generators smaller and easier to maintain > >>> * Share improvements across the data generators > >>> > >>> More details on the libraries are below. > >>> > >>> BigPetStore will be continue to focus on building and maintaining > >>> blueprints, powered by the BigTop Data Generators. > >>> > >>> Our vision is that we get all of Apache coming to BigTop for tools for > >>> building better, more comprehensive blueprints. We want to support these > >>> efforts through data generators and the initial set of blueprint we've > >>> been > >>> building. > >>> > >>> If the community is generally in support of this, I can create a top-level > >>> "bigtop-data-generators" directory and put the data generators and > >>> libraries in there. > >>> > >>> Thanks! > >>> > >>> RJ > >>> > >>> > >>> ------- > >>> Library details: > >>> > >>> So far, I've extracted the following common libraries: > >>> > >>> * Samplers -- provides classes for PDFs and various samplers > >>> * Name generator -- data set and samplers for generating names > >>> * Location data set -- data set and classes for US zip codes, their > >>> GPS coordinates, median house hold incomes, and population sizes > >>> * Product generator -- library for enumerating products from a > >>> specification file. Comes with default specifications for BigPetStore > >>> > >>> I also expect that I'll add libraries for: > >>> > >>> * Particle simulation -- customer movement in a room > >>> * Latent factor model generation -- generate latent factors and > >>> customer weights to create something like MovieLens data. Used in Bazaar > >>> for booth preferences and potentially in BigPetStore for customer item > >>> preferences > >>> > >>> Most of these libraries came out of the BigPetStore data generator but the > >>> other generators have been refactored to be based off the standard set of > >>> libraries. >
