I am very much like the shell script wrapper and docker image idea since that way we can integrate it directly with bigtop provisioner which yield a perfect ux for the whole things. I think its not too hard to do it both, we just need to add a parameter to turn the script into daemon mode. I see lots of image doing this way.
docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output data-dir --etc foo --etc bar --daemon 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道: > The BigPetStore, Bazaar, and weather data generators have single-threaded > command-line interfaces. We could do the same with the smaller generators > (names, locations, etc.) if there is interest. > > On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]> > wrote: > > > Nate: Good idea to abstract the interface one level higher.... > > > > How about a docker run command ? That is probably the easiest way for > > Linux folks to run one off Java apps nowadays. > > > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output > > data-dir --etc foo --etc bar > > > > I'm happy to curate such a docker image, I already am doing something > like > > this in kube for bigtop-transaction-queue, which continuously pumps data > > generator outputs into a REST endpoint or file > > Queue... So it could be extended to support other generators. > > > > > > > om> <[email protected]> wrote: > > > > > > Could picture at some point supporting something like this for non-jvm > > folk just looking for test/demo data: > > > > > > apt-get install bigtop-data-gen > > > ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir > > --etc foo --etc bar > > > > > > > > > > > > -----Original Message----- > > > From: jay vyas [mailto:[email protected]] > > > Sent: Sunday, August 30, 2015 5:11 PM > > > To: [email protected] > > > Subject: Re: Proposal for "BigTop Data Generators" > > > > > > Hola nate. Well, here are the Use cases I know of that I have used the > > data generators for. > > > > > > Dockerfile: > > > > > > (1) for testing kubernetes. For this, I just use transaction-queue > > docker file. > > > (2) for testing GlusterFS small file workloads, maybe with other > > analytics tools... > > > > > > Maven repo > > > > > > (3) Java maprduce/ignite/spark applications, which can just add a mvn > > repo when compiling. Java developers never add jars through RPM repos. > > > > > > RPM/DEB packages: > > > > > > I could see people using an RPM/DEB data generator, and I'm not against > > it. But I simply don't know of any real world projects which *currently* > > need RPM/Deb packages, which is why I haven't bothered to propose it as a > > requirement. Nevertheless linux packages are always a welcome addition > if > > someone wants to create em ! > > > > > > > > > > > > > > >> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote: > > >> > > >> Would container be in addition to deb/rpm, or instead of? If latter > > >> can we do deb/rpm as base then have container either created from them > > >> or directly from artifacts? > > >> > > >> On test usage side, seems could probably break up tests into > > >> base/required and then optional/add-on tests/test-suites. Think > > >> remember seeing mention of certain tests that are failing at times on > > >> certain component(s) anyways in the core builds but don’t mean that > > >> the build is broken, so would make sense to have some clean up around > > those anyways. > > >> > > >> -----Original Message----- > > >> From: RJ Nowling [mailto:[email protected]] > > >> Sent: Sunday, August 30, 2015 1:11 PM > > >> To: [email protected] > > >> Subject: Re: Proposal for "BigTop Data Generators" > > >> > > >> I agree with the above. :) > > >> > > >> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas > > >> <[email protected]> > > >> wrote: > > >> > > >>> Hi RJ. > > >>> > > >>> Maven repositories and docker containers for the transaction queue > > >>> are good enough IMO. That will give people a way to compose them in > > >>> different idioms (one for Java folks, another for broader Linux > > >>> audience > > >> ). > > >>> > > >>> I think the lib designs are fairly intuitive. I would say that we > > >>> should constrain them all to being written in Java or Groovy to keep > > >>> the bigtop theme of "JVM for everything" :). > > >>> > > >>> Any particular questions you have around technical design can be > > >>> followed in a JIRA or else maybe a Readme spec that goes in a top > > >>> level of the data-generators dir... > > >>> > > >>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote: > > >>>> > > >>>> I'd like to keep this conversation going. > > >>>> > > >>>> So here are a few discussion points: > > >>>> > > >>>> 1. How do we want to make the data generators available? Maven? > > >>>> RPMs > > >>> and > > >>>> Debs? > > >>>> > > >>>> For now, I'm using a gradle multi-project build to easily build > > >>>> and > > >>> install > > >>>> the BPS data generators and its libraries into a local maven repo. > > >>>> This makes development easy. Eventually, I would like to post > > >>>> binaries > > >>> through > > >>>> Maven for easy integration by users. RPMs / Debs could be > > >>>> interesting since I use a pattern where the data generators are > > >>>> libraries (to support application integration / parallelization by > > >>>> the host framework) but also provide CLI drivers for local testing. > > >>>> > > >>>> 2. The idea of using the data generators as part of the smoke > > >>>> tests came up. Since there is concern about making the data > > >>>> generators required, we could offer the blueprints (BigPetStore) > > >>>> as optional smoke tests. Would that be a good compromise? > > >>>> > > >>>> 3. How will they be maintained? > > >>>> > > >>>> I'll certainly add myself to the maintainers list and will be > > >>>> taking responsibility. I'm happy to have others help as well if > > >>>> anyone wants to > > >>>> -- if not, that's cool, too. > > >>>> > > >>>> 4. Is anyone interested at all in discussing library APIs and > designs? > > >>>> What about internal interfaces and such? > > >>>> > > >>>> > > >>>> My plan was to add at least one more data generator (weather > > >>>> simulator) > > >>> to > > >>>> bigtop-data-generators in the short term. However, given the > > >>>> concerns raised by Cos (more discussion needed) and Olaf (don't > > >>>> want to force data generators on unsuspecting users ;) ), I would > > >>>> like to reach some > > >>> consensus > > >>>> on what people are concerned about and solutions. > > >>>> > > >>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik > > >>>> <[email protected]> > > >>> wrote: > > >>>> > > >>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ > > >>> created, > > >>>>> so > > >>>>> we have a way to connect one to another ;) > > >>>>> > > >>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: > > >>>>>> Hi, > > >>>>>> > > >>>>>> I am not confident that moving important design discussions with > > >>>>>> impact > > >>>>> to > > >>>>>> the whole project to jira is a good idea. > > >>>>>> > > >>>>>> In the current JIRA Traffic storm it is not easy to identify and > > >>>>>> follow > > >>>>> important tickets. > > >>>>>> > > >>>>>> Please keep discussions on the list or at least, please state on > > >>>>>> this > > >>>>> list which Ticket to follow ... > > >>>>>> > > >>>>>> Olaf > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik < > [email protected] > > >: > > >>>>>>> > > >>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Nive to have data generators in Bigtop. > > >>>>>>>> > > >>>>>>>> But please do not include it in bigtop_utils, since this > > >>>>>>>> package is mandatory. Not everyone needs a data generator . > > >>>>>>> > > >>>>>>> Yup. And let's move further design discussion to the JIRA! > > >>>>>>> > > >>>>>>>> Olaf > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas < > > >>> [email protected] > > >>>>>> : > > >>>>>>>>> > > >>>>>>>>> Publishing the jar to bigtops maven is probably a good first > > >>>>>>>>> step > > >>>>> ,Then apps can just include it as needed...?. > > >>>>>>>>> > > >>>>>>>>> I'm not against packaging if someone wants packages for this. > > >>>>>>>>> Maybe > > >>>>> even include it in bigtop util ? > > >>>>>>>>> > > >>>>>>>>> Let's move to jira, > > >>>>>>>>> > > >>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik > > >>>>>>>>>> <[email protected]> > > >>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>> It is pretty cool indeed! > > >>>>>>>>>> > > >>>>>>>>>> I wonder how it needs to be structured to be: > > >>>>>>>>>> - easy to access/use from other components wherever it is > > >>>>>>>>>> needed > > >>>>>>>>>> - doesn't interfere with the rest of the stack > > >>>>>>>>>> > > >>>>>>>>>> I guess one possible way would be to implement the generator > > >>>>>>>>>> as a > > >>>>> set of maven > > >>>>>>>>>> artifacts, that could be installed/consumed transparently by > > >>>>>>>>>> just > > >>>>> declaring a > > >>>>>>>>>> dependency e.g as proposed via top-level component. > > >>>>>>>>>> > > >>>>>>>>>> Another way is to have a new package like we do for > > >>>>>>>>>> bigtop-utils > > >>>>> and such. > > >>>>>>>>>> > > >>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we > > >>>>> continue on the > > >>>>>>>>>> dev@ ?? > > >>>>>>>>>> > > >>>>>>>>>> Cos > > >>>>>>>>>> > > >>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > > >>>>>>>>>>> Hi BigTop, > > >>>>>>>>>>> > > >>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose > > >>>>>>>>>>> a new > > >>>>> component > > >>>>>>>>>>> for BigTop: BigTop Data Generators. > > >>>>>>>>>>> > > >>>>>>>>>>> BigTop Data Generators would consist of a common set of > > >>>>>>>>>>> libraries > > >>>>> for > > >>>>>>>>>>> building data generators and three example data generators: > > >>>>>>>>>>> > > >>>>>>>>>>> * BigPetStore transaction generator (moved from > > >>>>>>>>>>> BigPetStore) > > >>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with > > >>>>>>>>>>> booths > > >>>>> on a > > >>>>>>>>>>> showroom floor, at a conference, or at a mall > > >>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation > > >>>>> (temperature, wind > > >>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a > > >>>>>>>>>>> model > > >>>>> trained on > > >>>>>>>>>>> NOAA historical weather data) > > >>>>>>>>>>> > > >>>>>>>>>>> We believe that creating a common set of libraries will > > >>>>>>>>>>> have > > >>>>> several > > >>>>>>>>>>> benefits including: > > >>>>>>>>>>> > > >>>>>>>>>>> * Easier for others to build their own data generators > > >>>>>>>>>>> * Make data generators smaller and easier to maintain > > >>>>>>>>>>> * Share improvements across the data generators > > >>>>>>>>>>> > > >>>>>>>>>>> More details on the libraries are below. > > >>>>>>>>>>> > > >>>>>>>>>>> BigPetStore will be continue to focus on building and > > >>>>>>>>>>> maintaining blueprints, powered by the BigTop Data > Generators. > > >>>>>>>>>>> > > >>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop > > >>>>>>>>>>> for tools > > >>>>> for > > >>>>>>>>>>> building better, more comprehensive blueprints. We want to > > >>>>> support these > > >>>>>>>>>>> efforts through data generators and the initial set of > > >>>>>>>>>>> blueprint > > >>>>> we've been > > >>>>>>>>>>> building. > > >>>>>>>>>>> > > >>>>>>>>>>> If the community is generally in support of this, I can > > >>>>>>>>>>> create a > > >>>>> top-level > > >>>>>>>>>>> "bigtop-data-generators" directory and put the data > > >>>>>>>>>>> generators and libraries in there. > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks! > > >>>>>>>>>>> > > >>>>>>>>>>> RJ > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> ------- > > >>>>>>>>>>> Library details: > > >>>>>>>>>>> > > >>>>>>>>>>> So far, I've extracted the following common libraries: > > >>>>>>>>>>> > > >>>>>>>>>>> * Samplers -- provides classes for PDFs and various > > >>>>>>>>>>> samplers > > >>>>>>>>>>> * Name generator -- data set and samplers for generating > > >>>>>>>>>>> names > > >>>>>>>>>>> * Location data set -- data set and classes for US zip > > >>>>>>>>>>> codes, > > >>>>> their > > >>>>>>>>>>> GPS coordinates, median house hold incomes, and population > > >>>>>>>>>>> sizes > > >>>>>>>>>>> * Product generator -- library for enumerating products > > >>>>>>>>>>> from a specification file. Comes with default > > >>>>>>>>>>> specifications for > > >>>>> BigPetStore > > >>>>>>>>>>> > > >>>>>>>>>>> I also expect that I'll add libraries for: > > >>>>>>>>>>> > > >>>>>>>>>>> * Particle simulation -- customer movement in a room > > >>>>>>>>>>> * Latent factor model generation -- generate latent > > >>>>>>>>>>> factors and customer weights to create something like > > MovieLens data. > > >>>>>>>>>>> Used in > > >>>>> Bazaar > > >>>>>>>>>>> for booth preferences and potentially in BigPetStore for > > >>>>>>>>>>> customer > > >>>>> item > > >>>>>>>>>>> preferences > > >>>>>>>>>>> > > >>>>>>>>>>> Most of these libraries came out of the BigPetStore data > > >>>>>>>>>>> generator > > >>>>> but the > > >>>>>>>>>>> other generators have been refactored to be based off the > > >>>>>>>>>>> standard > > >>>>> set of > > >>>>>>>>>>> libraries. > > > > > > > > > -- > > > jay vyas > > > > > >
