Could picture at some point supporting something like this for non-jvm folk just looking for test/demo data:
apt-get install bigtop-data-gen ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir --etc foo --etc bar -----Original Message----- From: jay vyas [mailto:[email protected]] Sent: Sunday, August 30, 2015 5:11 PM To: [email protected] Subject: Re: Proposal for "BigTop Data Generators" Hola nate. Well, here are the Use cases I know of that I have used the data generators for. Dockerfile: (1) for testing kubernetes. For this, I just use transaction-queue docker file. (2) for testing GlusterFS small file workloads, maybe with other analytics tools... Maven repo (3) Java maprduce/ignite/spark applications, which can just add a mvn repo when compiling. Java developers never add jars through RPM repos. RPM/DEB packages: I could see people using an RPM/DEB data generator, and I'm not against it. But I simply don't know of any real world projects which *currently* need RPM/Deb packages, which is why I haven't bothered to propose it as a requirement. Nevertheless linux packages are always a welcome addition if someone wants to create em ! On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote: > Would container be in addition to deb/rpm, or instead of? If latter > can we do deb/rpm as base then have container either created from them > or directly from artifacts? > > On test usage side, seems could probably break up tests into > base/required and then optional/add-on tests/test-suites. Think > remember seeing mention of certain tests that are failing at times on > certain component(s) anyways in the core builds but don’t mean that > the build is broken, so would make sense to have some clean up around those > anyways. > > -----Original Message----- > From: RJ Nowling [mailto:[email protected]] > Sent: Sunday, August 30, 2015 1:11 PM > To: [email protected] > Subject: Re: Proposal for "BigTop Data Generators" > > I agree with the above. :) > > On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas > <[email protected]> > wrote: > > > Hi RJ. > > > > Maven repositories and docker containers for the transaction queue > > are good enough IMO. That will give people a way to compose them in > > different idioms (one for Java folks, another for broader Linux > > audience > ). > > > > I think the lib designs are fairly intuitive. I would say that we > > should constrain them all to being written in Java or Groovy to keep > > the bigtop theme of "JVM for everything" :). > > > > Any particular questions you have around technical design can be > > followed in a JIRA or else maybe a Readme spec that goes in a top > > level of the data-generators dir... > > > > > On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote: > > > > > > I'd like to keep this conversation going. > > > > > > So here are a few discussion points: > > > > > > 1. How do we want to make the data generators available? Maven? > > > RPMs > > and > > > Debs? > > > > > > For now, I'm using a gradle multi-project build to easily build > > > and > > install > > > the BPS data generators and its libraries into a local maven repo. > > > This makes development easy. Eventually, I would like to post > > > binaries > > through > > > Maven for easy integration by users. RPMs / Debs could be > > > interesting since I use a pattern where the data generators are > > > libraries (to support application integration / parallelization by > > > the host framework) but also provide CLI drivers for local testing. > > > > > > 2. The idea of using the data generators as part of the smoke > > > tests came up. Since there is concern about making the data > > > generators required, we could offer the blueprints (BigPetStore) > > > as optional smoke tests. Would that be a good compromise? > > > > > > 3. How will they be maintained? > > > > > > I'll certainly add myself to the maintainers list and will be > > > taking responsibility. I'm happy to have others help as well if > > > anyone wants to > > > -- if not, that's cool, too. > > > > > > 4. Is anyone interested at all in discussing library APIs and designs? > > > What about internal interfaces and such? > > > > > > > > > My plan was to add at least one more data generator (weather > > > simulator) > > to > > > bigtop-data-generators in the short term. However, given the > > > concerns raised by Cos (more discussion needed) and Olaf (don't > > > want to force data generators on unsuspecting users ;) ), I would > > > like to reach some > > consensus > > > on what people are concerned about and solutions. > > > > > > On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik > > > <[email protected]> > > wrote: > > > > > >> Fine by me. I have linked this thread to the JIRA ticket that RJ > > created, > > >> so > > >> we have a way to connect one to another ;) > > >> > > >>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: > > >>> Hi, > > >>> > > >>> I am not confident that moving important design discussions with > > >>> impact > > >> to > > >>> the whole project to jira is a good idea. > > >>> > > >>> In the current JIRA Traffic storm it is not easy to identify and > > >>> follow > > >> important tickets. > > >>> > > >>> Please keep discussions on the list or at least, please state on > > >>> this > > >> list which Ticket to follow ... > > >>> > > >>> Olaf > > >>> > > >>> > > >>> > > >>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <[email protected]>: > > >>>> > > >>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: > > >>>>> Hi, > > >>>>> > > >>>>> Nive to have data generators in Bigtop. > > >>>>> > > >>>>> But please do not include it in bigtop_utils, since this > > >>>>> package is mandatory. Not everyone needs a data generator . > > >>>> > > >>>> Yup. And let's move further design discussion to the JIRA! > > >>>> > > >>>>> Olaf > > >>>>> > > >>>>> > > >>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas < > > [email protected] > > >>> : > > >>>>>> > > >>>>>> Publishing the jar to bigtops maven is probably a good first > > >>>>>> step > > >> ,Then apps can just include it as needed...?. > > >>>>>> > > >>>>>> I'm not against packaging if someone wants packages for this. > > >>>>>> Maybe > > >> even include it in bigtop util ? > > >>>>>> > > >>>>>> Let's move to jira, > > >>>>>> > > >>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik > > >>>>>>> <[email protected]> > > >> wrote: > > >>>>>>> > > >>>>>>> It is pretty cool indeed! > > >>>>>>> > > >>>>>>> I wonder how it needs to be structured to be: > > >>>>>>> - easy to access/use from other components wherever it is > > >>>>>>> needed > > >>>>>>> - doesn't interfere with the rest of the stack > > >>>>>>> > > >>>>>>> I guess one possible way would be to implement the generator > > >>>>>>> as a > > >> set of maven > > >>>>>>> artifacts, that could be installed/consumed transparently by > > >>>>>>> just > > >> declaring a > > >>>>>>> dependency e.g as proposed via top-level component. > > >>>>>>> > > >>>>>>> Another way is to have a new package like we do for > > >>>>>>> bigtop-utils > > >> and such. > > >>>>>>> > > >>>>>>> Perhaps this discussion should be moved to JIRA or shall we > > >> continue on the > > >>>>>>> dev@ ?? > > >>>>>>> > > >>>>>>> Cos > > >>>>>>> > > >>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > > >>>>>>>> Hi BigTop, > > >>>>>>>> > > >>>>>>>> I had a discussion with Jay yesterday, we'd like to propose > > >>>>>>>> a new > > >> component > > >>>>>>>> for BigTop: BigTop Data Generators. > > >>>>>>>> > > >>>>>>>> BigTop Data Generators would consist of a common set of > > >>>>>>>> libraries > > >> for > > >>>>>>>> building data generators and three example data generators: > > >>>>>>>> > > >>>>>>>> * BigPetStore transaction generator (moved from > > >>>>>>>> BigPetStore) > > >>>>>>>> * BigTop Bazaar -- attendee movement and interactions with > > >>>>>>>> booths > > >> on a > > >>>>>>>> showroom floor, at a conference, or at a mall > > >>>>>>>> * BigTop Weatherman -- stochastic weather simulation > > >> (temperature, wind > > >>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a > > >>>>>>>> model > > >> trained on > > >>>>>>>> NOAA historical weather data) > > >>>>>>>> > > >>>>>>>> We believe that creating a common set of libraries will > > >>>>>>>> have > > >> several > > >>>>>>>> benefits including: > > >>>>>>>> > > >>>>>>>> * Easier for others to build their own data generators > > >>>>>>>> * Make data generators smaller and easier to maintain > > >>>>>>>> * Share improvements across the data generators > > >>>>>>>> > > >>>>>>>> More details on the libraries are below. > > >>>>>>>> > > >>>>>>>> BigPetStore will be continue to focus on building and > > >>>>>>>> maintaining blueprints, powered by the BigTop Data Generators. > > >>>>>>>> > > >>>>>>>> Our vision is that we get all of Apache coming to BigTop > > >>>>>>>> for tools > > >> for > > >>>>>>>> building better, more comprehensive blueprints. We want to > > >> support these > > >>>>>>>> efforts through data generators and the initial set of > > >>>>>>>> blueprint > > >> we've been > > >>>>>>>> building. > > >>>>>>>> > > >>>>>>>> If the community is generally in support of this, I can > > >>>>>>>> create a > > >> top-level > > >>>>>>>> "bigtop-data-generators" directory and put the data > > >>>>>>>> generators and libraries in there. > > >>>>>>>> > > >>>>>>>> Thanks! > > >>>>>>>> > > >>>>>>>> RJ > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> ------- > > >>>>>>>> Library details: > > >>>>>>>> > > >>>>>>>> So far, I've extracted the following common libraries: > > >>>>>>>> > > >>>>>>>> * Samplers -- provides classes for PDFs and various > > >>>>>>>> samplers > > >>>>>>>> * Name generator -- data set and samplers for generating > > >>>>>>>> names > > >>>>>>>> * Location data set -- data set and classes for US zip > > >>>>>>>> codes, > > >> their > > >>>>>>>> GPS coordinates, median house hold incomes, and population > > >>>>>>>> sizes > > >>>>>>>> * Product generator -- library for enumerating products > > >>>>>>>> from a specification file. Comes with default > > >>>>>>>> specifications for > > >> BigPetStore > > >>>>>>>> > > >>>>>>>> I also expect that I'll add libraries for: > > >>>>>>>> > > >>>>>>>> * Particle simulation -- customer movement in a room > > >>>>>>>> * Latent factor model generation -- generate latent > > >>>>>>>> factors and customer weights to create something like MovieLens > > >>>>>>>> data. > > >>>>>>>> Used in > > >> Bazaar > > >>>>>>>> for booth preferences and potentially in BigPetStore for > > >>>>>>>> customer > > >> item > > >>>>>>>> preferences > > >>>>>>>> > > >>>>>>>> Most of these libraries came out of the BigPetStore data > > >>>>>>>> generator > > >> but the > > >>>>>>>> other generators have been refactored to be based off the > > >>>>>>>> standard > > >> set of > > >>>>>>>> libraries. > > >> > > >> > > >> > > > > -- jay vyas
