On Mon, Aug 31, 2015 at 08:01PM, jay vyas wrote: > - I agree Gradle is better than "yet another script". > > - And docker container, even if suboptimal, as thin wrapper to gradle > container is all you need to deliver the data-generator on the masses so > they can try it w/ zero startup cost.
That's true. But it didn't sound this way originally, hence I've asked. > On Mon, Aug 31, 2015 at 7:36 PM, Konstantin Boudnik <[email protected]> wrote: > > > Why do would we need yet another script (and potentially an extra readme to > > explain its command options) when we have the gradle? > > > > Cos > > > > On Mon, Aug 31, 2015 at 10:51PM, Olaf Flebbe wrote: > > > +1 to the CLI /shell script interface. > > > > > > If I can choose I like to have a apt-get install bigtop-datagenerator , > > running for instance > > > > > > bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels > > simulationLength seed > > > > > > I can help out with packaging if needed. > > > > > > Why should we use the docker indirection for a plain CLI file ? Of > > course, We can provide a trivial Dockerfile to create a container supplying > > a JVM and running the CLI ... But I do not like to depend our services on > > docker registry more than we do now. > > > > > > Olaf > > > > > > > > > > > > > Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>: > > > > > > > > I am very much like the shell script wrapper and docker image idea > > since > > > > that way we can integrate it directly with bigtop provisioner which > > yield a > > > > perfect ux for the whole things. I think its not too hard to do it > > both, we > > > > just need to add a parameter to turn the script into daemon mode. I see > > > > lots of image doing this way. > > > > > > > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output > > > > data-dir --etc foo --etc bar --daemon > > > > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道: > > > > > > > >> The BigPetStore, Bazaar, and weather data generators have > > single-threaded > > > >> command-line interfaces. We could do the same with the smaller > > generators > > > >> (names, locations, etc.) if there is interest. > > > >> > > > >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas < > > [email protected]> > > > >> wrote: > > > >> > > > >>> Nate: Good idea to abstract the interface one level higher.... > > > >>> > > > >>> How about a docker run command ? That is probably the easiest way for > > > >>> Linux folks to run one off Java apps nowadays. > > > >>> > > > >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB > > --output > > > >>> data-dir --etc foo --etc bar > > > >>> > > > >>> I'm happy to curate such a docker image, I already am doing something > > > >> like > > > >>> this in kube for bigtop-transaction-queue, which continuously pumps > > data > > > >>> generator outputs into a REST endpoint or file > > > >>> Queue... So it could be extended to support other generators. > > > >>> > > > >>> > > > >>>> om> <[email protected]> wrote: > > > >>>> > > > >>>> Could picture at some point supporting something like this for > > non-jvm > > > >>> folk just looking for test/demo data: > > > >>>> > > > >>>> apt-get install bigtop-data-gen > > > >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir > > > >>> --etc foo --etc bar > > > >>>> > > > >>>> > > > >>>> > > > >>>> -----Original Message----- > > > >>>> From: jay vyas [mailto:[email protected]] > > > >>>> Sent: Sunday, August 30, 2015 5:11 PM > > > >>>> To: [email protected] > > > >>>> Subject: Re: Proposal for "BigTop Data Generators" > > > >>>> > > > >>>> Hola nate. Well, here are the Use cases I know of that I have used > > the > > > >>> data generators for. > > > >>>> > > > >>>> Dockerfile: > > > >>>> > > > >>>> (1) for testing kubernetes. For this, I just use transaction-queue > > > >>> docker file. > > > >>>> (2) for testing GlusterFS small file workloads, maybe with other > > > >>> analytics tools... > > > >>>> > > > >>>> Maven repo > > > >>>> > > > >>>> (3) Java maprduce/ignite/spark applications, which can just add a > > mvn > > > >>> repo when compiling. Java developers never add jars through RPM > > repos. > > > >>>> > > > >>>> RPM/DEB packages: > > > >>>> > > > >>>> I could see people using an RPM/DEB data generator, and I'm not > > against > > > >>> it. But I simply don't know of any real world projects which > > *currently* > > > >>> need RPM/Deb packages, which is why I haven't bothered to propose it > > as a > > > >>> requirement. Nevertheless linux packages are always a welcome > > addition > > > >> if > > > >>> someone wants to create em ! > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote: > > > >>>>> > > > >>>>> Would container be in addition to deb/rpm, or instead of? If > > latter > > > >>>>> can we do deb/rpm as base then have container either created from > > them > > > >>>>> or directly from artifacts? > > > >>>>> > > > >>>>> On test usage side, seems could probably break up tests into > > > >>>>> base/required and then optional/add-on tests/test-suites. Think > > > >>>>> remember seeing mention of certain tests that are failing at times > > on > > > >>>>> certain component(s) anyways in the core builds but don’t mean that > > > >>>>> the build is broken, so would make sense to have some clean up > > around > > > >>> those anyways. > > > >>>>> > > > >>>>> -----Original Message----- > > > >>>>> From: RJ Nowling [mailto:[email protected]] > > > >>>>> Sent: Sunday, August 30, 2015 1:11 PM > > > >>>>> To: [email protected] > > > >>>>> Subject: Re: Proposal for "BigTop Data Generators" > > > >>>>> > > > >>>>> I agree with the above. :) > > > >>>>> > > > >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas > > > >>>>> <[email protected]> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hi RJ. > > > >>>>>> > > > >>>>>> Maven repositories and docker containers for the transaction queue > > > >>>>>> are good enough IMO. That will give people a way to compose them > > in > > > >>>>>> different idioms (one for Java folks, another for broader Linux > > > >>>>>> audience > > > >>>>> ). > > > >>>>>> > > > >>>>>> I think the lib designs are fairly intuitive. I would say that we > > > >>>>>> should constrain them all to being written in Java or Groovy to > > keep > > > >>>>>> the bigtop theme of "JVM for everything" :). > > > >>>>>> > > > >>>>>> Any particular questions you have around technical design can be > > > >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a top > > > >>>>>> level of the data-generators dir... > > > >>>>>> > > > >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> > > wrote: > > > >>>>>>> > > > >>>>>>> I'd like to keep this conversation going. > > > >>>>>>> > > > >>>>>>> So here are a few discussion points: > > > >>>>>>> > > > >>>>>>> 1. How do we want to make the data generators available? Maven? > > > >>>>>>> RPMs > > > >>>>>> and > > > >>>>>>> Debs? > > > >>>>>>> > > > >>>>>>> For now, I'm using a gradle multi-project build to easily build > > > >>>>>>> and > > > >>>>>> install > > > >>>>>>> the BPS data generators and its libraries into a local maven > > repo. > > > >>>>>>> This makes development easy. Eventually, I would like to post > > > >>>>>>> binaries > > > >>>>>> through > > > >>>>>>> Maven for easy integration by users. RPMs / Debs could be > > > >>>>>>> interesting since I use a pattern where the data generators are > > > >>>>>>> libraries (to support application integration / parallelization > > by > > > >>>>>>> the host framework) but also provide CLI drivers for local > > testing. > > > >>>>>>> > > > >>>>>>> 2. The idea of using the data generators as part of the smoke > > > >>>>>>> tests came up. Since there is concern about making the data > > > >>>>>>> generators required, we could offer the blueprints (BigPetStore) > > > >>>>>>> as optional smoke tests. Would that be a good compromise? > > > >>>>>>> > > > >>>>>>> 3. How will they be maintained? > > > >>>>>>> > > > >>>>>>> I'll certainly add myself to the maintainers list and will be > > > >>>>>>> taking responsibility. I'm happy to have others help as well if > > > >>>>>>> anyone wants to > > > >>>>>>> -- if not, that's cool, too. > > > >>>>>>> > > > >>>>>>> 4. Is anyone interested at all in discussing library APIs and > > > >> designs? > > > >>>>>>> What about internal interfaces and such? > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> My plan was to add at least one more data generator (weather > > > >>>>>>> simulator) > > > >>>>>> to > > > >>>>>>> bigtop-data-generators in the short term. However, given the > > > >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't > > > >>>>>>> want to force data generators on unsuspecting users ;) ), I would > > > >>>>>>> like to reach some > > > >>>>>> consensus > > > >>>>>>> on what people are concerned about and solutions. > > > >>>>>>> > > > >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik > > > >>>>>>> <[email protected]> > > > >>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ > > > >>>>>> created, > > > >>>>>>>> so > > > >>>>>>>> we have a way to connect one to another ;) > > > >>>>>>>> > > > >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: > > > >>>>>>>>> Hi, > > > >>>>>>>>> > > > >>>>>>>>> I am not confident that moving important design discussions > > with > > > >>>>>>>>> impact > > > >>>>>>>> to > > > >>>>>>>>> the whole project to jira is a good idea. > > > >>>>>>>>> > > > >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify > > and > > > >>>>>>>>> follow > > > >>>>>>>> important tickets. > > > >>>>>>>>> > > > >>>>>>>>> Please keep discussions on the list or at least, please state > > on > > > >>>>>>>>> this > > > >>>>>>>> list which Ticket to follow ... > > > >>>>>>>>> > > > >>>>>>>>> Olaf > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik < > > > >> [email protected] > > > >>>> : > > > >>>>>>>>>> > > > >>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: > > > >>>>>>>>>>> Hi, > > > >>>>>>>>>>> > > > >>>>>>>>>>> Nive to have data generators in Bigtop. > > > >>>>>>>>>>> > > > >>>>>>>>>>> But please do not include it in bigtop_utils, since this > > > >>>>>>>>>>> package is mandatory. Not everyone needs a data generator . > > > >>>>>>>>>> > > > >>>>>>>>>> Yup. And let's move further design discussion to the JIRA! > > > >>>>>>>>>> > > > >>>>>>>>>>> Olaf > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas < > > > >>>>>> [email protected] > > > >>>>>>>>> : > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first > > > >>>>>>>>>>>> step > > > >>>>>>>> ,Then apps can just include it as needed...?. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I'm not against packaging if someone wants packages for > > this. > > > >>>>>>>>>>>> Maybe > > > >>>>>>>> even include it in bigtop util ? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Let's move to jira, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik > > > >>>>>>>>>>>>> <[email protected]> > > > >>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> It is pretty cool indeed! > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I wonder how it needs to be structured to be: > > > >>>>>>>>>>>>> - easy to access/use from other components wherever it is > > > >>>>>>>>>>>>> needed > > > >>>>>>>>>>>>> - doesn't interfere with the rest of the stack > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I guess one possible way would be to implement the > > generator > > > >>>>>>>>>>>>> as a > > > >>>>>>>> set of maven > > > >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently > > by > > > >>>>>>>>>>>>> just > > > >>>>>>>> declaring a > > > >>>>>>>>>>>>> dependency e.g as proposed via top-level component. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Another way is to have a new package like we do for > > > >>>>>>>>>>>>> bigtop-utils > > > >>>>>>>> and such. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we > > > >>>>>>>> continue on the > > > >>>>>>>>>>>>> dev@ ?? > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Cos > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > > > >>>>>>>>>>>>>> Hi BigTop, > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to > > propose > > > >>>>>>>>>>>>>> a new > > > >>>>>>>> component > > > >>>>>>>>>>>>>> for BigTop: BigTop Data Generators. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of > > > >>>>>>>>>>>>>> libraries > > > >>>>>>>> for > > > >>>>>>>>>>>>>> building data generators and three example data > > generators: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from > > > >>>>>>>>>>>>>> BigPetStore) > > > >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with > > > >>>>>>>>>>>>>> booths > > > >>>>>>>> on a > > > >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall > > > >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation > > > >>>>>>>> (temperature, wind > > > >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a > > > >>>>>>>>>>>>>> model > > > >>>>>>>> trained on > > > >>>>>>>>>>>>>> NOAA historical weather data) > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> We believe that creating a common set of libraries will > > > >>>>>>>>>>>>>> have > > > >>>>>>>> several > > > >>>>>>>>>>>>>> benefits including: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> * Easier for others to build their own data generators > > > >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain > > > >>>>>>>>>>>>>> * Share improvements across the data generators > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> More details on the libraries are below. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> BigPetStore will be continue to focus on building and > > > >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data > > > >> Generators. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop > > > >>>>>>>>>>>>>> for tools > > > >>>>>>>> for > > > >>>>>>>>>>>>>> building better, more comprehensive blueprints. We want > > to > > > >>>>>>>> support these > > > >>>>>>>>>>>>>> efforts through data generators and the initial set of > > > >>>>>>>>>>>>>> blueprint > > > >>>>>>>> we've been > > > >>>>>>>>>>>>>> building. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> If the community is generally in support of this, I can > > > >>>>>>>>>>>>>> create a > > > >>>>>>>> top-level > > > >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data > > > >>>>>>>>>>>>>> generators and libraries in there. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Thanks! > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> RJ > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> ------- > > > >>>>>>>>>>>>>> Library details: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> So far, I've extracted the following common libraries: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various > > > >>>>>>>>>>>>>> samplers > > > >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating > > > >>>>>>>>>>>>>> names > > > >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip > > > >>>>>>>>>>>>>> codes, > > > >>>>>>>> their > > > >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population > > > >>>>>>>>>>>>>> sizes > > > >>>>>>>>>>>>>> * Product generator -- library for enumerating products > > > >>>>>>>>>>>>>> from a specification file. Comes with default > > > >>>>>>>>>>>>>> specifications for > > > >>>>>>>> BigPetStore > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I also expect that I'll add libraries for: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room > > > >>>>>>>>>>>>>> * Latent factor model generation -- generate latent > > > >>>>>>>>>>>>>> factors and customer weights to create something like > > > >>> MovieLens data. > > > >>>>>>>>>>>>>> Used in > > > >>>>>>>> Bazaar > > > >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for > > > >>>>>>>>>>>>>> customer > > > >>>>>>>> item > > > >>>>>>>>>>>>>> preferences > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data > > > >>>>>>>>>>>>>> generator > > > >>>>>>>> but the > > > >>>>>>>>>>>>>> other generators have been refactored to be based off the > > > >>>>>>>>>>>>>> standard > > > >>>>>>>> set of > > > >>>>>>>>>>>>>> libraries. > > > >>>> > > > >>>> > > > >>>> -- > > > >>>> jay vyas > > > >>>> > > > >>> > > > >> > > > > > > > > > > > > -- > jay vyas
