Or probably it's way simpler to juat have one script since the data generator runs once and it can be gone. 2015年8月31日 下午10:45於 "Jay Vyas" <[email protected]>寫道:
> Rj can we abstract the command line so that we have "one cli to rule them > all" into an interface? > > > > On Aug 31, 2015, at 10:40 AM, Evans Ye <[email protected]> wrote: > > > > I am very much like the shell script wrapper and docker image idea since > > that way we can integrate it directly with bigtop provisioner which > yield a > > perfect ux for the whole things. I think its not too hard to do it both, > we > > just need to add a parameter to turn the script into daemon mode. I see > > lots of image doing this way. > > > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output > > data-dir --etc foo --etc bar --daemon > > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道: > > > >> The BigPetStore, Bazaar, and weather data generators have > single-threaded > >> command-line interfaces. We could do the same with the smaller > generators > >> (names, locations, etc.) if there is interest. > >> > >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]> > >> wrote: > >> > >>> Nate: Good idea to abstract the interface one level higher.... > >>> > >>> How about a docker run command ? That is probably the easiest way for > >>> Linux folks to run one off Java apps nowadays. > >>> > >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output > >>> data-dir --etc foo --etc bar > >>> > >>> I'm happy to curate such a docker image, I already am doing something > >> like > >>> this in kube for bigtop-transaction-queue, which continuously pumps > data > >>> generator outputs into a REST endpoint or file > >>> Queue... So it could be extended to support other generators. > >>> > >>> > >>>> om> <[email protected]> wrote: > >>>> > >>>> Could picture at some point supporting something like this for non-jvm > >>> folk just looking for test/demo data: > >>>> > >>>> apt-get install bigtop-data-gen > >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir > >>> --etc foo --etc bar > >>>> > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: jay vyas [mailto:[email protected]] > >>>> Sent: Sunday, August 30, 2015 5:11 PM > >>>> To: [email protected] > >>>> Subject: Re: Proposal for "BigTop Data Generators" > >>>> > >>>> Hola nate. Well, here are the Use cases I know of that I have used > the > >>> data generators for. > >>>> > >>>> Dockerfile: > >>>> > >>>> (1) for testing kubernetes. For this, I just use transaction-queue > >>> docker file. > >>>> (2) for testing GlusterFS small file workloads, maybe with other > >>> analytics tools... > >>>> > >>>> Maven repo > >>>> > >>>> (3) Java maprduce/ignite/spark applications, which can just add a mvn > >>> repo when compiling. Java developers never add jars through RPM repos. > >>>> > >>>> RPM/DEB packages: > >>>> > >>>> I could see people using an RPM/DEB data generator, and I'm not > against > >>> it. But I simply don't know of any real world projects which > *currently* > >>> need RPM/Deb packages, which is why I haven't bothered to propose it > as a > >>> requirement. Nevertheless linux packages are always a welcome addition > >> if > >>> someone wants to create em ! > >>>> > >>>> > >>>> > >>>> > >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote: > >>>>> > >>>>> Would container be in addition to deb/rpm, or instead of? If latter > >>>>> can we do deb/rpm as base then have container either created from > them > >>>>> or directly from artifacts? > >>>>> > >>>>> On test usage side, seems could probably break up tests into > >>>>> base/required and then optional/add-on tests/test-suites. Think > >>>>> remember seeing mention of certain tests that are failing at times on > >>>>> certain component(s) anyways in the core builds but don’t mean that > >>>>> the build is broken, so would make sense to have some clean up around > >>> those anyways. > >>>>> > >>>>> -----Original Message----- > >>>>> From: RJ Nowling [mailto:[email protected]] > >>>>> Sent: Sunday, August 30, 2015 1:11 PM > >>>>> To: [email protected] > >>>>> Subject: Re: Proposal for "BigTop Data Generators" > >>>>> > >>>>> I agree with the above. :) > >>>>> > >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas > >>>>> <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> Hi RJ. > >>>>>> > >>>>>> Maven repositories and docker containers for the transaction queue > >>>>>> are good enough IMO. That will give people a way to compose them in > >>>>>> different idioms (one for Java folks, another for broader Linux > >>>>>> audience > >>>>> ). > >>>>>> > >>>>>> I think the lib designs are fairly intuitive. I would say that we > >>>>>> should constrain them all to being written in Java or Groovy to keep > >>>>>> the bigtop theme of "JVM for everything" :). > >>>>>> > >>>>>> Any particular questions you have around technical design can be > >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a top > >>>>>> level of the data-generators dir... > >>>>>> > >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> > wrote: > >>>>>>> > >>>>>>> I'd like to keep this conversation going. > >>>>>>> > >>>>>>> So here are a few discussion points: > >>>>>>> > >>>>>>> 1. How do we want to make the data generators available? Maven? > >>>>>>> RPMs > >>>>>> and > >>>>>>> Debs? > >>>>>>> > >>>>>>> For now, I'm using a gradle multi-project build to easily build > >>>>>>> and > >>>>>> install > >>>>>>> the BPS data generators and its libraries into a local maven repo. > >>>>>>> This makes development easy. Eventually, I would like to post > >>>>>>> binaries > >>>>>> through > >>>>>>> Maven for easy integration by users. RPMs / Debs could be > >>>>>>> interesting since I use a pattern where the data generators are > >>>>>>> libraries (to support application integration / parallelization by > >>>>>>> the host framework) but also provide CLI drivers for local testing. > >>>>>>> > >>>>>>> 2. The idea of using the data generators as part of the smoke > >>>>>>> tests came up. Since there is concern about making the data > >>>>>>> generators required, we could offer the blueprints (BigPetStore) > >>>>>>> as optional smoke tests. Would that be a good compromise? > >>>>>>> > >>>>>>> 3. How will they be maintained? > >>>>>>> > >>>>>>> I'll certainly add myself to the maintainers list and will be > >>>>>>> taking responsibility. I'm happy to have others help as well if > >>>>>>> anyone wants to > >>>>>>> -- if not, that's cool, too. > >>>>>>> > >>>>>>> 4. Is anyone interested at all in discussing library APIs and > >> designs? > >>>>>>> What about internal interfaces and such? > >>>>>>> > >>>>>>> > >>>>>>> My plan was to add at least one more data generator (weather > >>>>>>> simulator) > >>>>>> to > >>>>>>> bigtop-data-generators in the short term. However, given the > >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't > >>>>>>> want to force data generators on unsuspecting users ;) ), I would > >>>>>>> like to reach some > >>>>>> consensus > >>>>>>> on what people are concerned about and solutions. > >>>>>>> > >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik > >>>>>>> <[email protected]> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ > >>>>>> created, > >>>>>>>> so > >>>>>>>> we have a way to connect one to another ;) > >>>>>>>> > >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I am not confident that moving important design discussions with > >>>>>>>>> impact > >>>>>>>> to > >>>>>>>>> the whole project to jira is a good idea. > >>>>>>>>> > >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify and > >>>>>>>>> follow > >>>>>>>> important tickets. > >>>>>>>>> > >>>>>>>>> Please keep discussions on the list or at least, please state on > >>>>>>>>> this > >>>>>>>> list which Ticket to follow ... > >>>>>>>>> > >>>>>>>>> Olaf > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik < > >> [email protected] > >>>> : > >>>>>>>>>> > >>>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> Nive to have data generators in Bigtop. > >>>>>>>>>>> > >>>>>>>>>>> But please do not include it in bigtop_utils, since this > >>>>>>>>>>> package is mandatory. Not everyone needs a data generator . > >>>>>>>>>> > >>>>>>>>>> Yup. And let's move further design discussion to the JIRA! > >>>>>>>>>> > >>>>>>>>>>> Olaf > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas < > >>>>>> [email protected] > >>>>>>>>> : > >>>>>>>>>>>> > >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first > >>>>>>>>>>>> step > >>>>>>>> ,Then apps can just include it as needed...?. > >>>>>>>>>>>> > >>>>>>>>>>>> I'm not against packaging if someone wants packages for this. > >>>>>>>>>>>> Maybe > >>>>>>>> even include it in bigtop util ? > >>>>>>>>>>>> > >>>>>>>>>>>> Let's move to jira, > >>>>>>>>>>>> > >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik > >>>>>>>>>>>>> <[email protected]> > >>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> It is pretty cool indeed! > >>>>>>>>>>>>> > >>>>>>>>>>>>> I wonder how it needs to be structured to be: > >>>>>>>>>>>>> - easy to access/use from other components wherever it is > >>>>>>>>>>>>> needed > >>>>>>>>>>>>> - doesn't interfere with the rest of the stack > >>>>>>>>>>>>> > >>>>>>>>>>>>> I guess one possible way would be to implement the generator > >>>>>>>>>>>>> as a > >>>>>>>> set of maven > >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently by > >>>>>>>>>>>>> just > >>>>>>>> declaring a > >>>>>>>>>>>>> dependency e.g as proposed via top-level component. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Another way is to have a new package like we do for > >>>>>>>>>>>>> bigtop-utils > >>>>>>>> and such. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we > >>>>>>>> continue on the > >>>>>>>>>>>>> dev@ ?? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Cos > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > >>>>>>>>>>>>>> Hi BigTop, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose > >>>>>>>>>>>>>> a new > >>>>>>>> component > >>>>>>>>>>>>>> for BigTop: BigTop Data Generators. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of > >>>>>>>>>>>>>> libraries > >>>>>>>> for > >>>>>>>>>>>>>> building data generators and three example data generators: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from > >>>>>>>>>>>>>> BigPetStore) > >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with > >>>>>>>>>>>>>> booths > >>>>>>>> on a > >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall > >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation > >>>>>>>> (temperature, wind > >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a > >>>>>>>>>>>>>> model > >>>>>>>> trained on > >>>>>>>>>>>>>> NOAA historical weather data) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> We believe that creating a common set of libraries will > >>>>>>>>>>>>>> have > >>>>>>>> several > >>>>>>>>>>>>>> benefits including: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> * Easier for others to build their own data generators > >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain > >>>>>>>>>>>>>> * Share improvements across the data generators > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> More details on the libraries are below. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> BigPetStore will be continue to focus on building and > >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data > >> Generators. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop > >>>>>>>>>>>>>> for tools > >>>>>>>> for > >>>>>>>>>>>>>> building better, more comprehensive blueprints. We want to > >>>>>>>> support these > >>>>>>>>>>>>>> efforts through data generators and the initial set of > >>>>>>>>>>>>>> blueprint > >>>>>>>> we've been > >>>>>>>>>>>>>> building. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If the community is generally in support of this, I can > >>>>>>>>>>>>>> create a > >>>>>>>> top-level > >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data > >>>>>>>>>>>>>> generators and libraries in there. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> RJ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ------- > >>>>>>>>>>>>>> Library details: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So far, I've extracted the following common libraries: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various > >>>>>>>>>>>>>> samplers > >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating > >>>>>>>>>>>>>> names > >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip > >>>>>>>>>>>>>> codes, > >>>>>>>> their > >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population > >>>>>>>>>>>>>> sizes > >>>>>>>>>>>>>> * Product generator -- library for enumerating products > >>>>>>>>>>>>>> from a specification file. Comes with default > >>>>>>>>>>>>>> specifications for > >>>>>>>> BigPetStore > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I also expect that I'll add libraries for: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room > >>>>>>>>>>>>>> * Latent factor model generation -- generate latent > >>>>>>>>>>>>>> factors and customer weights to create something like > >>> MovieLens data. > >>>>>>>>>>>>>> Used in > >>>>>>>> Bazaar > >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for > >>>>>>>>>>>>>> customer > >>>>>>>> item > >>>>>>>>>>>>>> preferences > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data > >>>>>>>>>>>>>> generator > >>>>>>>> but the > >>>>>>>>>>>>>> other generators have been refactored to be based off the > >>>>>>>>>>>>>> standard > >>>>>>>> set of > >>>>>>>>>>>>>> libraries. > >>>> > >>>> > >>>> -- > >>>> jay vyas > >> >
