+1 to the CLI /shell script interface.

If I can choose I like to have a apt-get install bigtop-datagenerator , running 
for instance

bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels 
simulationLength seed

I can help out with packaging if needed.

Why should we use the docker indirection for a plain CLI file ? Of course, We 
can provide a trivial Dockerfile to create a container supplying a JVM and 
running the CLI ... But I do not like to depend our services on docker registry 
more than we do now.

Olaf



> Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>:
> 
> I am very much like the shell script wrapper and docker image idea since
> that way we can integrate it directly with bigtop provisioner which yield a
> perfect ux for the whole things. I think its not too hard to do it both, we
> just need to add a parameter to turn the script into daemon mode. I see
> lots of image doing this way.
> 
> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> data-dir --etc  foo --etc bar --daemon
> 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道:
> 
>> The BigPetStore, Bazaar, and weather data generators have single-threaded
>> command-line interfaces.  We could do the same with the smaller generators
>> (names, locations, etc.) if there is interest.
>> 
>> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]>
>> wrote:
>> 
>>> Nate: Good idea to abstract the interface one level higher....
>>> 
>>> How about a docker run command ? That is probably the easiest way for
>>> Linux folks to run one off Java apps nowadays.
>>> 
>>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
>>> data-dir --etc  foo --etc bar
>>> 
>>> I'm happy to curate such a docker image, I already am doing something
>> like
>>> this in kube for bigtop-transaction-queue, which continuously pumps data
>>> generator outputs into a REST endpoint or file
>>> Queue... So it could be extended to support other generators.
>>> 
>>> 
>>>> om> <[email protected]> wrote:
>>>> 
>>>> Could picture at some point supporting something like this for non-jvm
>>> folk just looking for test/demo data:
>>>> 
>>>> apt-get install bigtop-data-gen
>>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir
>>> --etc  foo --etc bar
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: jay vyas [mailto:[email protected]]
>>>> Sent: Sunday, August 30, 2015 5:11 PM
>>>> To: [email protected]
>>>> Subject: Re: Proposal for "BigTop Data Generators"
>>>> 
>>>> Hola nate.  Well, here are the Use cases I know of that I have used the
>>> data generators for.
>>>> 
>>>> Dockerfile:
>>>> 
>>>> (1) for testing kubernetes.  For this, I just use transaction-queue
>>> docker file.
>>>> (2) for testing GlusterFS small file workloads, maybe with other
>>> analytics tools...
>>>> 
>>>> Maven repo
>>>> 
>>>> (3) Java maprduce/ignite/spark applications, which can just add a mvn
>>> repo when compiling.  Java developers never add jars through RPM repos.
>>>> 
>>>> RPM/DEB packages:
>>>> 
>>>> I could see people using an RPM/DEB data generator, and I'm not against
>>> it.  But I simply don't know of any real world projects which *currently*
>>> need RPM/Deb packages, which is why I haven't bothered to propose it as a
>>> requirement.  Nevertheless linux packages are always a welcome addition
>> if
>>> someone wants to create em !
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
>>>>> 
>>>>> Would container be in addition to deb/rpm, or instead of?  If latter
>>>>> can we do deb/rpm as base then have container either created from them
>>>>> or directly from artifacts?
>>>>> 
>>>>> On test usage side, seems could probably break up tests into
>>>>> base/required and then optional/add-on tests/test-suites.  Think
>>>>> remember seeing mention of certain tests that are failing at times on
>>>>> certain component(s) anyways in the core builds but don’t mean that
>>>>> the build is broken, so would make sense to have some clean up around
>>> those anyways.
>>>>> 
>>>>> -----Original Message-----
>>>>> From: RJ Nowling [mailto:[email protected]]
>>>>> Sent: Sunday, August 30, 2015 1:11 PM
>>>>> To: [email protected]
>>>>> Subject: Re: Proposal for "BigTop Data Generators"
>>>>> 
>>>>> I agree with the above. :)
>>>>> 
>>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
>>>>> <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi RJ.
>>>>>> 
>>>>>> Maven repositories and docker containers for the transaction queue
>>>>>> are good enough IMO.  That will give people a way to compose them in
>>>>>> different idioms (one for Java folks, another for broader Linux
>>>>>> audience
>>>>> ).
>>>>>> 
>>>>>> I think the lib designs are fairly intuitive.  I would say that we
>>>>>> should constrain them all to being written in Java or Groovy to keep
>>>>>> the bigtop theme of "JVM for everything" :).
>>>>>> 
>>>>>> Any particular questions you have around technical design can be
>>>>>> followed in a JIRA or else maybe a Readme spec that goes in a  top
>>>>>> level of the data-generators dir...
>>>>>> 
>>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote:
>>>>>>> 
>>>>>>> I'd like to keep this conversation going.
>>>>>>> 
>>>>>>> So here are a few discussion points:
>>>>>>> 
>>>>>>> 1. How do we want to make the data generators available?  Maven?
>>>>>>> RPMs
>>>>>> and
>>>>>>> Debs?
>>>>>>> 
>>>>>>> For now, I'm using a gradle multi-project build to easily build
>>>>>>> and
>>>>>> install
>>>>>>> the BPS data generators and its libraries into a local maven repo.
>>>>>>> This makes development easy.  Eventually, I would like to post
>>>>>>> binaries
>>>>>> through
>>>>>>> Maven for easy integration by users.  RPMs / Debs could be
>>>>>>> interesting since I use a pattern where the data generators are
>>>>>>> libraries (to support application integration / parallelization by
>>>>>>> the host framework) but also provide CLI drivers for local testing.
>>>>>>> 
>>>>>>> 2.  The idea of using the data generators as part of the smoke
>>>>>>> tests came up.  Since there is concern about making the data
>>>>>>> generators required, we could offer the blueprints (BigPetStore)
>>>>>>> as optional smoke tests.  Would that be a good compromise?
>>>>>>> 
>>>>>>> 3.  How will they be maintained?
>>>>>>> 
>>>>>>> I'll certainly add myself to the maintainers list and will be
>>>>>>> taking responsibility.  I'm happy to have others help as well if
>>>>>>> anyone wants to
>>>>>>> -- if not, that's cool, too.
>>>>>>> 
>>>>>>> 4. Is anyone interested at all in discussing library APIs and
>> designs?
>>>>>>> What about internal interfaces and such?
>>>>>>> 
>>>>>>> 
>>>>>>> My plan was to add at least one more data generator (weather
>>>>>>> simulator)
>>>>>> to
>>>>>>> bigtop-data-generators in the short term.  However, given the
>>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't
>>>>>>> want to force data generators on unsuspecting users ;) ), I would
>>>>>>> like to reach some
>>>>>> consensus
>>>>>>> on what people are concerned about and solutions.
>>>>>>> 
>>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
>>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ
>>>>>> created,
>>>>>>>> so
>>>>>>>> we have a way to connect one to another ;)
>>>>>>>> 
>>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I am not confident that moving important design discussions with
>>>>>>>>> impact
>>>>>>>> to
>>>>>>>>> the whole project to jira is a good idea.
>>>>>>>>> 
>>>>>>>>> In the current JIRA Traffic storm it is not easy to identify and
>>>>>>>>> follow
>>>>>>>> important tickets.
>>>>>>>>> 
>>>>>>>>> Please keep discussions on the list or at least, please state on
>>>>>>>>> this
>>>>>>>> list which Ticket to follow ...
>>>>>>>>> 
>>>>>>>>> Olaf
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <
>> [email protected]
>>>> :
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Nive to have data generators in Bigtop.
>>>>>>>>>>> 
>>>>>>>>>>> But please do not include it in bigtop_utils, since this
>>>>>>>>>>> package is mandatory. Not everyone needs a data generator .
>>>>>>>>>> 
>>>>>>>>>> Yup. And let's move further design discussion to the JIRA!
>>>>>>>>>> 
>>>>>>>>>>> Olaf
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
>>>>>> [email protected]
>>>>>>>>> :
>>>>>>>>>>>> 
>>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first
>>>>>>>>>>>> step
>>>>>>>> ,Then apps can just include it as needed...?.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm not against packaging if someone wants packages for this.
>>>>>>>>>>>> Maybe
>>>>>>>> even include it in bigtop util ?
>>>>>>>>>>>> 
>>>>>>>>>>>> Let's move to jira,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
>>>>>>>>>>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It is pretty cool indeed!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I wonder how it needs to be structured to be:
>>>>>>>>>>>>> - easy to access/use from other components wherever it is
>>>>>>>>>>>>> needed
>>>>>>>>>>>>> - doesn't interfere with the rest of the stack
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I guess one possible way would be to implement the generator
>>>>>>>>>>>>> as a
>>>>>>>> set of maven
>>>>>>>>>>>>> artifacts, that could be installed/consumed transparently by
>>>>>>>>>>>>> just
>>>>>>>> declaring a
>>>>>>>>>>>>> dependency e.g as proposed via top-level component.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Another way is to have a new package like we do for
>>>>>>>>>>>>> bigtop-utils
>>>>>>>> and such.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we
>>>>>>>> continue on the
>>>>>>>>>>>>> dev@ ??
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cos
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
>>>>>>>>>>>>>> Hi BigTop,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose
>>>>>>>>>>>>>> a new
>>>>>>>> component
>>>>>>>>>>>>>> for BigTop: BigTop Data Generators.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of
>>>>>>>>>>>>>> libraries
>>>>>>>> for
>>>>>>>>>>>>>> building data generators and three example data generators:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * BigPetStore transaction generator (moved from
>>>>>>>>>>>>>> BigPetStore)
>>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with
>>>>>>>>>>>>>> booths
>>>>>>>> on a
>>>>>>>>>>>>>> showroom floor, at a conference, or at a mall
>>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
>>>>>>>> (temperature, wind
>>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a
>>>>>>>>>>>>>> model
>>>>>>>> trained on
>>>>>>>>>>>>>> NOAA historical weather data)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We believe that creating a common set of libraries will
>>>>>>>>>>>>>> have
>>>>>>>> several
>>>>>>>>>>>>>> benefits including:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * Easier for others to build their own data generators
>>>>>>>>>>>>>> * Make data generators smaller and easier to maintain
>>>>>>>>>>>>>> * Share improvements across the data generators
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> More details on the libraries are below.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BigPetStore will be continue to focus on building  and
>>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data
>> Generators.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop
>>>>>>>>>>>>>> for tools
>>>>>>>> for
>>>>>>>>>>>>>> building better, more comprehensive blueprints.  We want to
>>>>>>>> support these
>>>>>>>>>>>>>> efforts through data generators and the initial set of
>>>>>>>>>>>>>> blueprint
>>>>>>>> we've been
>>>>>>>>>>>>>> building.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If the community is generally in support of this, I can
>>>>>>>>>>>>>> create a
>>>>>>>> top-level
>>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data
>>>>>>>>>>>>>> generators and libraries in there.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> RJ
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -------
>>>>>>>>>>>>>> Library details:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So far, I've extracted the following common libraries:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various
>>>>>>>>>>>>>> samplers
>>>>>>>>>>>>>> * Name generator -- data set and samplers for generating
>>>>>>>>>>>>>> names
>>>>>>>>>>>>>> * Location data set -- data set and classes for US zip
>>>>>>>>>>>>>> codes,
>>>>>>>> their
>>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population
>>>>>>>>>>>>>> sizes
>>>>>>>>>>>>>> * Product generator -- library for enumerating products
>>>>>>>>>>>>>> from a specification file.  Comes with default
>>>>>>>>>>>>>> specifications for
>>>>>>>> BigPetStore
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also expect that I'll add libraries for:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * Particle simulation -- customer movement in a room
>>>>>>>>>>>>>> * Latent factor model generation -- generate latent
>>>>>>>>>>>>>> factors and customer weights to create something like
>>> MovieLens data.
>>>>>>>>>>>>>> Used in
>>>>>>>> Bazaar
>>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for
>>>>>>>>>>>>>> customer
>>>>>>>> item
>>>>>>>>>>>>>> preferences
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data
>>>>>>>>>>>>>> generator
>>>>>>>> but the
>>>>>>>>>>>>>> other generators have been refactored to be based off the
>>>>>>>>>>>>>> standard
>>>>>>>> set of
>>>>>>>>>>>>>> libraries.
>>>> 
>>>> 
>>>> --
>>>> jay vyas
>>>> 
>>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to