+1 to the CLI /shell script interface. If I can choose I like to have a apt-get install bigtop-datagenerator , running for instance
bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels simulationLength seed I can help out with packaging if needed. Why should we use the docker indirection for a plain CLI file ? Of course, We can provide a trivial Dockerfile to create a container supplying a JVM and running the CLI ... But I do not like to depend our services on docker registry more than we do now. Olaf > Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>: > > I am very much like the shell script wrapper and docker image idea since > that way we can integrate it directly with bigtop provisioner which yield a > perfect ux for the whole things. I think its not too hard to do it both, we > just need to add a parameter to turn the script into daemon mode. I see > lots of image doing this way. > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output > data-dir --etc foo --etc bar --daemon > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道: > >> The BigPetStore, Bazaar, and weather data generators have single-threaded >> command-line interfaces. We could do the same with the smaller generators >> (names, locations, etc.) if there is interest. >> >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]> >> wrote: >> >>> Nate: Good idea to abstract the interface one level higher.... >>> >>> How about a docker run command ? That is probably the easiest way for >>> Linux folks to run one off Java apps nowadays. >>> >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output >>> data-dir --etc foo --etc bar >>> >>> I'm happy to curate such a docker image, I already am doing something >> like >>> this in kube for bigtop-transaction-queue, which continuously pumps data >>> generator outputs into a REST endpoint or file >>> Queue... So it could be extended to support other generators. >>> >>> >>>> om> <[email protected]> wrote: >>>> >>>> Could picture at some point supporting something like this for non-jvm >>> folk just looking for test/demo data: >>>> >>>> apt-get install bigtop-data-gen >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir >>> --etc foo --etc bar >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: jay vyas [mailto:[email protected]] >>>> Sent: Sunday, August 30, 2015 5:11 PM >>>> To: [email protected] >>>> Subject: Re: Proposal for "BigTop Data Generators" >>>> >>>> Hola nate. Well, here are the Use cases I know of that I have used the >>> data generators for. >>>> >>>> Dockerfile: >>>> >>>> (1) for testing kubernetes. For this, I just use transaction-queue >>> docker file. >>>> (2) for testing GlusterFS small file workloads, maybe with other >>> analytics tools... >>>> >>>> Maven repo >>>> >>>> (3) Java maprduce/ignite/spark applications, which can just add a mvn >>> repo when compiling. Java developers never add jars through RPM repos. >>>> >>>> RPM/DEB packages: >>>> >>>> I could see people using an RPM/DEB data generator, and I'm not against >>> it. But I simply don't know of any real world projects which *currently* >>> need RPM/Deb packages, which is why I haven't bothered to propose it as a >>> requirement. Nevertheless linux packages are always a welcome addition >> if >>> someone wants to create em ! >>>> >>>> >>>> >>>> >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote: >>>>> >>>>> Would container be in addition to deb/rpm, or instead of? If latter >>>>> can we do deb/rpm as base then have container either created from them >>>>> or directly from artifacts? >>>>> >>>>> On test usage side, seems could probably break up tests into >>>>> base/required and then optional/add-on tests/test-suites. Think >>>>> remember seeing mention of certain tests that are failing at times on >>>>> certain component(s) anyways in the core builds but don’t mean that >>>>> the build is broken, so would make sense to have some clean up around >>> those anyways. >>>>> >>>>> -----Original Message----- >>>>> From: RJ Nowling [mailto:[email protected]] >>>>> Sent: Sunday, August 30, 2015 1:11 PM >>>>> To: [email protected] >>>>> Subject: Re: Proposal for "BigTop Data Generators" >>>>> >>>>> I agree with the above. :) >>>>> >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas >>>>> <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi RJ. >>>>>> >>>>>> Maven repositories and docker containers for the transaction queue >>>>>> are good enough IMO. That will give people a way to compose them in >>>>>> different idioms (one for Java folks, another for broader Linux >>>>>> audience >>>>> ). >>>>>> >>>>>> I think the lib designs are fairly intuitive. I would say that we >>>>>> should constrain them all to being written in Java or Groovy to keep >>>>>> the bigtop theme of "JVM for everything" :). >>>>>> >>>>>> Any particular questions you have around technical design can be >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a top >>>>>> level of the data-generators dir... >>>>>> >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote: >>>>>>> >>>>>>> I'd like to keep this conversation going. >>>>>>> >>>>>>> So here are a few discussion points: >>>>>>> >>>>>>> 1. How do we want to make the data generators available? Maven? >>>>>>> RPMs >>>>>> and >>>>>>> Debs? >>>>>>> >>>>>>> For now, I'm using a gradle multi-project build to easily build >>>>>>> and >>>>>> install >>>>>>> the BPS data generators and its libraries into a local maven repo. >>>>>>> This makes development easy. Eventually, I would like to post >>>>>>> binaries >>>>>> through >>>>>>> Maven for easy integration by users. RPMs / Debs could be >>>>>>> interesting since I use a pattern where the data generators are >>>>>>> libraries (to support application integration / parallelization by >>>>>>> the host framework) but also provide CLI drivers for local testing. >>>>>>> >>>>>>> 2. The idea of using the data generators as part of the smoke >>>>>>> tests came up. Since there is concern about making the data >>>>>>> generators required, we could offer the blueprints (BigPetStore) >>>>>>> as optional smoke tests. Would that be a good compromise? >>>>>>> >>>>>>> 3. How will they be maintained? >>>>>>> >>>>>>> I'll certainly add myself to the maintainers list and will be >>>>>>> taking responsibility. I'm happy to have others help as well if >>>>>>> anyone wants to >>>>>>> -- if not, that's cool, too. >>>>>>> >>>>>>> 4. Is anyone interested at all in discussing library APIs and >> designs? >>>>>>> What about internal interfaces and such? >>>>>>> >>>>>>> >>>>>>> My plan was to add at least one more data generator (weather >>>>>>> simulator) >>>>>> to >>>>>>> bigtop-data-generators in the short term. However, given the >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't >>>>>>> want to force data generators on unsuspecting users ;) ), I would >>>>>>> like to reach some >>>>>> consensus >>>>>>> on what people are concerned about and solutions. >>>>>>> >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik >>>>>>> <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ >>>>>> created, >>>>>>>> so >>>>>>>> we have a way to connect one to another ;) >>>>>>>> >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am not confident that moving important design discussions with >>>>>>>>> impact >>>>>>>> to >>>>>>>>> the whole project to jira is a good idea. >>>>>>>>> >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify and >>>>>>>>> follow >>>>>>>> important tickets. >>>>>>>>> >>>>>>>>> Please keep discussions on the list or at least, please state on >>>>>>>>> this >>>>>>>> list which Ticket to follow ... >>>>>>>>> >>>>>>>>> Olaf >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik < >> [email protected] >>>> : >>>>>>>>>> >>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Nive to have data generators in Bigtop. >>>>>>>>>>> >>>>>>>>>>> But please do not include it in bigtop_utils, since this >>>>>>>>>>> package is mandatory. Not everyone needs a data generator . >>>>>>>>>> >>>>>>>>>> Yup. And let's move further design discussion to the JIRA! >>>>>>>>>> >>>>>>>>>>> Olaf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas < >>>>>> [email protected] >>>>>>>>> : >>>>>>>>>>>> >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first >>>>>>>>>>>> step >>>>>>>> ,Then apps can just include it as needed...?. >>>>>>>>>>>> >>>>>>>>>>>> I'm not against packaging if someone wants packages for this. >>>>>>>>>>>> Maybe >>>>>>>> even include it in bigtop util ? >>>>>>>>>>>> >>>>>>>>>>>> Let's move to jira, >>>>>>>>>>>> >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik >>>>>>>>>>>>> <[email protected]> >>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> It is pretty cool indeed! >>>>>>>>>>>>> >>>>>>>>>>>>> I wonder how it needs to be structured to be: >>>>>>>>>>>>> - easy to access/use from other components wherever it is >>>>>>>>>>>>> needed >>>>>>>>>>>>> - doesn't interfere with the rest of the stack >>>>>>>>>>>>> >>>>>>>>>>>>> I guess one possible way would be to implement the generator >>>>>>>>>>>>> as a >>>>>>>> set of maven >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently by >>>>>>>>>>>>> just >>>>>>>> declaring a >>>>>>>>>>>>> dependency e.g as proposed via top-level component. >>>>>>>>>>>>> >>>>>>>>>>>>> Another way is to have a new package like we do for >>>>>>>>>>>>> bigtop-utils >>>>>>>> and such. >>>>>>>>>>>>> >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we >>>>>>>> continue on the >>>>>>>>>>>>> dev@ ?? >>>>>>>>>>>>> >>>>>>>>>>>>> Cos >>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: >>>>>>>>>>>>>> Hi BigTop, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose >>>>>>>>>>>>>> a new >>>>>>>> component >>>>>>>>>>>>>> for BigTop: BigTop Data Generators. >>>>>>>>>>>>>> >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of >>>>>>>>>>>>>> libraries >>>>>>>> for >>>>>>>>>>>>>> building data generators and three example data generators: >>>>>>>>>>>>>> >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from >>>>>>>>>>>>>> BigPetStore) >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with >>>>>>>>>>>>>> booths >>>>>>>> on a >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation >>>>>>>> (temperature, wind >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a >>>>>>>>>>>>>> model >>>>>>>> trained on >>>>>>>>>>>>>> NOAA historical weather data) >>>>>>>>>>>>>> >>>>>>>>>>>>>> We believe that creating a common set of libraries will >>>>>>>>>>>>>> have >>>>>>>> several >>>>>>>>>>>>>> benefits including: >>>>>>>>>>>>>> >>>>>>>>>>>>>> * Easier for others to build their own data generators >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain >>>>>>>>>>>>>> * Share improvements across the data generators >>>>>>>>>>>>>> >>>>>>>>>>>>>> More details on the libraries are below. >>>>>>>>>>>>>> >>>>>>>>>>>>>> BigPetStore will be continue to focus on building and >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data >> Generators. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop >>>>>>>>>>>>>> for tools >>>>>>>> for >>>>>>>>>>>>>> building better, more comprehensive blueprints. We want to >>>>>>>> support these >>>>>>>>>>>>>> efforts through data generators and the initial set of >>>>>>>>>>>>>> blueprint >>>>>>>> we've been >>>>>>>>>>>>>> building. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If the community is generally in support of this, I can >>>>>>>>>>>>>> create a >>>>>>>> top-level >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data >>>>>>>>>>>>>> generators and libraries in there. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>> RJ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------- >>>>>>>>>>>>>> Library details: >>>>>>>>>>>>>> >>>>>>>>>>>>>> So far, I've extracted the following common libraries: >>>>>>>>>>>>>> >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various >>>>>>>>>>>>>> samplers >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating >>>>>>>>>>>>>> names >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip >>>>>>>>>>>>>> codes, >>>>>>>> their >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population >>>>>>>>>>>>>> sizes >>>>>>>>>>>>>> * Product generator -- library for enumerating products >>>>>>>>>>>>>> from a specification file. Comes with default >>>>>>>>>>>>>> specifications for >>>>>>>> BigPetStore >>>>>>>>>>>>>> >>>>>>>>>>>>>> I also expect that I'll add libraries for: >>>>>>>>>>>>>> >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room >>>>>>>>>>>>>> * Latent factor model generation -- generate latent >>>>>>>>>>>>>> factors and customer weights to create something like >>> MovieLens data. >>>>>>>>>>>>>> Used in >>>>>>>> Bazaar >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for >>>>>>>>>>>>>> customer >>>>>>>> item >>>>>>>>>>>>>> preferences >>>>>>>>>>>>>> >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data >>>>>>>>>>>>>> generator >>>>>>>> but the >>>>>>>>>>>>>> other generators have been refactored to be based off the >>>>>>>>>>>>>> standard >>>>>>>> set of >>>>>>>>>>>>>> libraries. >>>> >>>> >>>> -- >>>> jay vyas >>>> >>> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
