Thanks for everyone's input!  I've haven't add new code to the data
generators (just refactorings, bug fixes) while this discussion is
ongoing.  I'd like to pick up the work again.

I think we can agree that we won't:
* Add the data generators to required packages so users aren't forced to
install them
* Make Docker the only delivery mechanism

and agree that the data generators will have CLIs.

The BigPetStore data generator already has CLI support built into the jar:
$ java -jar bigpetstore-data-generator-1.1.0-SNAPSHOT.jar outputDir nStores
nCustomers nPurchasingModels simulationLength seed

I think packaging, maven artifacts, Docker images, and integration with
smoke tests can all be handled in subsequent JIRAs.  If any of these
interest you, please file the appropriate JIRA.  :)

Does this sound good to everyone?

On Mon, Aug 31, 2015 at 7:08 PM, Konstantin Boudnik <[email protected]> wrote:

> On Mon, Aug 31, 2015 at 08:01PM, jay vyas wrote:
> > - I agree Gradle is better than "yet another script".
> >
> > - And docker container, even if suboptimal, as thin wrapper to gradle
> > container is all you need to deliver the data-generator on the masses so
> > they can try it w/ zero startup cost.
>
> That's true. But it didn't sound this way originally, hence I've asked.
>
> > On Mon, Aug 31, 2015 at 7:36 PM, Konstantin Boudnik <[email protected]>
> wrote:
> >
> > > Why do would we need yet another script (and potentially an extra
> readme to
> > > explain its command options) when we have the gradle?
> > >
> > > Cos
> > >
> > > On Mon, Aug 31, 2015 at 10:51PM, Olaf Flebbe wrote:
> > > > +1 to the CLI /shell script interface.
> > > >
> > > > If I can choose I like to have a apt-get install
> bigtop-datagenerator ,
> > > running for instance
> > > >
> > > > bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels
> > > simulationLength seed
> > > >
> > > > I can help out with packaging if needed.
> > > >
> > > > Why should we use the docker indirection for a plain CLI file ? Of
> > > course, We can provide a trivial Dockerfile to create a container
> supplying
> > > a JVM and running the CLI ... But I do not like to depend our services
> on
> > > docker registry more than we do now.
> > > >
> > > > Olaf
> > > >
> > > >
> > > >
> > > > > Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>:
> > > > >
> > > > > I am very much like the shell script wrapper and docker image idea
> > > since
> > > > > that way we can integrate it directly with bigtop provisioner which
> > > yield a
> > > > > perfect ux for the whole things. I think its not too hard to do it
> > > both, we
> > > > > just need to add a parameter to turn the script into daemon mode.
> I see
> > > > > lots of image doing this way.
> > > > >
> > > > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB
> --output
> > > > > data-dir --etc  foo --etc bar --daemon
> > > > > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道:
> > > > >
> > > > >> The BigPetStore, Bazaar, and weather data generators have
> > > single-threaded
> > > > >> command-line interfaces.  We could do the same with the smaller
> > > generators
> > > > >> (names, locations, etc.) if there is interest.
> > > > >>
> > > > >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <
> > > [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >>> Nate: Good idea to abstract the interface one level higher....
> > > > >>>
> > > > >>> How about a docker run command ? That is probably the easiest
> way for
> > > > >>> Linux folks to run one off Java apps nowadays.
> > > > >>>
> > > > >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB
> > > --output
> > > > >>> data-dir --etc  foo --etc bar
> > > > >>>
> > > > >>> I'm happy to curate such a docker image, I already am doing
> something
> > > > >> like
> > > > >>> this in kube for bigtop-transaction-queue, which continuously
> pumps
> > > data
> > > > >>> generator outputs into a REST endpoint or file
> > > > >>> Queue... So it could be extended to support other generators.
> > > > >>>
> > > > >>>
> > > > >>>> om> <[email protected]> wrote:
> > > > >>>>
> > > > >>>> Could picture at some point supporting something like this for
> > > non-jvm
> > > > >>> folk just looking for test/demo data:
> > > > >>>>
> > > > >>>> apt-get install bigtop-data-gen
> > > > >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output
> data-dir
> > > > >>> --etc  foo --etc bar
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: jay vyas [mailto:[email protected]]
> > > > >>>> Sent: Sunday, August 30, 2015 5:11 PM
> > > > >>>> To: [email protected]
> > > > >>>> Subject: Re: Proposal for "BigTop Data Generators"
> > > > >>>>
> > > > >>>> Hola nate.  Well, here are the Use cases I know of that I have
> used
> > > the
> > > > >>> data generators for.
> > > > >>>>
> > > > >>>> Dockerfile:
> > > > >>>>
> > > > >>>> (1) for testing kubernetes.  For this, I just use
> transaction-queue
> > > > >>> docker file.
> > > > >>>> (2) for testing GlusterFS small file workloads, maybe with other
> > > > >>> analytics tools...
> > > > >>>>
> > > > >>>> Maven repo
> > > > >>>>
> > > > >>>> (3) Java maprduce/ignite/spark applications, which can just add
> a
> > > mvn
> > > > >>> repo when compiling.  Java developers never add jars through RPM
> > > repos.
> > > > >>>>
> > > > >>>> RPM/DEB packages:
> > > > >>>>
> > > > >>>> I could see people using an RPM/DEB data generator, and I'm not
> > > against
> > > > >>> it.  But I simply don't know of any real world projects which
> > > *currently*
> > > > >>> need RPM/Deb packages, which is why I haven't bothered to
> propose it
> > > as a
> > > > >>> requirement.  Nevertheless linux packages are always a welcome
> > > addition
> > > > >> if
> > > > >>> someone wants to create em !
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
> > > > >>>>>
> > > > >>>>> Would container be in addition to deb/rpm, or instead of?  If
> > > latter
> > > > >>>>> can we do deb/rpm as base then have container either created
> from
> > > them
> > > > >>>>> or directly from artifacts?
> > > > >>>>>
> > > > >>>>> On test usage side, seems could probably break up tests into
> > > > >>>>> base/required and then optional/add-on tests/test-suites.
> Think
> > > > >>>>> remember seeing mention of certain tests that are failing at
> times
> > > on
> > > > >>>>> certain component(s) anyways in the core builds but don’t mean
> that
> > > > >>>>> the build is broken, so would make sense to have some clean up
> > > around
> > > > >>> those anyways.
> > > > >>>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: RJ Nowling [mailto:[email protected]]
> > > > >>>>> Sent: Sunday, August 30, 2015 1:11 PM
> > > > >>>>> To: [email protected]
> > > > >>>>> Subject: Re: Proposal for "BigTop Data Generators"
> > > > >>>>>
> > > > >>>>> I agree with the above. :)
> > > > >>>>>
> > > > >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
> > > > >>>>> <[email protected]>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hi RJ.
> > > > >>>>>>
> > > > >>>>>> Maven repositories and docker containers for the transaction
> queue
> > > > >>>>>> are good enough IMO.  That will give people a way to compose
> them
> > > in
> > > > >>>>>> different idioms (one for Java folks, another for broader
> Linux
> > > > >>>>>> audience
> > > > >>>>> ).
> > > > >>>>>>
> > > > >>>>>> I think the lib designs are fairly intuitive.  I would say
> that we
> > > > >>>>>> should constrain them all to being written in Java or Groovy
> to
> > > keep
> > > > >>>>>> the bigtop theme of "JVM for everything" :).
> > > > >>>>>>
> > > > >>>>>> Any particular questions you have around technical design can
> be
> > > > >>>>>> followed in a JIRA or else maybe a Readme spec that goes in
> a  top
> > > > >>>>>> level of the data-generators dir...
> > > > >>>>>>
> > > > >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]>
> > > wrote:
> > > > >>>>>>>
> > > > >>>>>>> I'd like to keep this conversation going.
> > > > >>>>>>>
> > > > >>>>>>> So here are a few discussion points:
> > > > >>>>>>>
> > > > >>>>>>> 1. How do we want to make the data generators available?
> Maven?
> > > > >>>>>>> RPMs
> > > > >>>>>> and
> > > > >>>>>>> Debs?
> > > > >>>>>>>
> > > > >>>>>>> For now, I'm using a gradle multi-project build to easily
> build
> > > > >>>>>>> and
> > > > >>>>>> install
> > > > >>>>>>> the BPS data generators and its libraries into a local maven
> > > repo.
> > > > >>>>>>> This makes development easy.  Eventually, I would like to
> post
> > > > >>>>>>> binaries
> > > > >>>>>> through
> > > > >>>>>>> Maven for easy integration by users.  RPMs / Debs could be
> > > > >>>>>>> interesting since I use a pattern where the data generators
> are
> > > > >>>>>>> libraries (to support application integration /
> parallelization
> > > by
> > > > >>>>>>> the host framework) but also provide CLI drivers for local
> > > testing.
> > > > >>>>>>>
> > > > >>>>>>> 2.  The idea of using the data generators as part of the
> smoke
> > > > >>>>>>> tests came up.  Since there is concern about making the data
> > > > >>>>>>> generators required, we could offer the blueprints
> (BigPetStore)
> > > > >>>>>>> as optional smoke tests.  Would that be a good compromise?
> > > > >>>>>>>
> > > > >>>>>>> 3.  How will they be maintained?
> > > > >>>>>>>
> > > > >>>>>>> I'll certainly add myself to the maintainers list and will be
> > > > >>>>>>> taking responsibility.  I'm happy to have others help as
> well if
> > > > >>>>>>> anyone wants to
> > > > >>>>>>> -- if not, that's cool, too.
> > > > >>>>>>>
> > > > >>>>>>> 4. Is anyone interested at all in discussing library APIs and
> > > > >> designs?
> > > > >>>>>>> What about internal interfaces and such?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> My plan was to add at least one more data generator (weather
> > > > >>>>>>> simulator)
> > > > >>>>>> to
> > > > >>>>>>> bigtop-data-generators in the short term.  However, given the
> > > > >>>>>>> concerns raised by Cos (more discussion needed) and Olaf
> (don't
> > > > >>>>>>> want to force data generators on unsuspecting users ;) ), I
> would
> > > > >>>>>>> like to reach some
> > > > >>>>>> consensus
> > > > >>>>>>> on what people are concerned about and solutions.
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
> > > > >>>>>>> <[email protected]>
> > > > >>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket
> that RJ
> > > > >>>>>> created,
> > > > >>>>>>>> so
> > > > >>>>>>>> we have a way to connect one to another ;)
> > > > >>>>>>>>
> > > > >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> > > > >>>>>>>>> Hi,
> > > > >>>>>>>>>
> > > > >>>>>>>>> I am not confident that moving important design discussions
> > > with
> > > > >>>>>>>>> impact
> > > > >>>>>>>> to
> > > > >>>>>>>>> the whole project to jira is a good idea.
> > > > >>>>>>>>>
> > > > >>>>>>>>> In the current JIRA Traffic storm it is not easy to
> identify
> > > and
> > > > >>>>>>>>> follow
> > > > >>>>>>>> important tickets.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Please keep discussions on the list or at least, please
> state
> > > on
> > > > >>>>>>>>> this
> > > > >>>>>>>> list which Ticket to follow ...
> > > > >>>>>>>>>
> > > > >>>>>>>>> Olaf
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <
> > > > >> [email protected]
> > > > >>>> :
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> > > > >>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Nive to have data generators in Bigtop.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> But please do not include it in bigtop_utils, since this
> > > > >>>>>>>>>>> package is mandatory. Not everyone needs a data
> generator .
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Yup. And let's move further design discussion to the JIRA!
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Olaf
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> > > > >>>>>> [email protected]
> > > > >>>>>>>>> :
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good
> first
> > > > >>>>>>>>>>>> step
> > > > >>>>>>>> ,Then apps can just include it as needed...?.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I'm not against packaging if someone wants packages for
> > > this.
> > > > >>>>>>>>>>>> Maybe
> > > > >>>>>>>> even include it in bigtop util ?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Let's move to jira,
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
> > > > >>>>>>>>>>>>> <[email protected]>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> It is pretty cool indeed!
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> I wonder how it needs to be structured to be:
> > > > >>>>>>>>>>>>> - easy to access/use from other components wherever it
> is
> > > > >>>>>>>>>>>>> needed
> > > > >>>>>>>>>>>>> - doesn't interfere with the rest of the stack
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> I guess one possible way would be to implement the
> > > generator
> > > > >>>>>>>>>>>>> as a
> > > > >>>>>>>> set of maven
> > > > >>>>>>>>>>>>> artifacts, that could be installed/consumed
> transparently
> > > by
> > > > >>>>>>>>>>>>> just
> > > > >>>>>>>> declaring a
> > > > >>>>>>>>>>>>> dependency e.g as proposed via top-level component.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Another way is to have a new package like we do for
> > > > >>>>>>>>>>>>> bigtop-utils
> > > > >>>>>>>> and such.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or
> shall we
> > > > >>>>>>>> continue on the
> > > > >>>>>>>>>>>>> dev@ ??
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Cos
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> > > > >>>>>>>>>>>>>> Hi BigTop,
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to
> > > propose
> > > > >>>>>>>>>>>>>> a new
> > > > >>>>>>>> component
> > > > >>>>>>>>>>>>>> for BigTop: BigTop Data Generators.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set
> of
> > > > >>>>>>>>>>>>>> libraries
> > > > >>>>>>>> for
> > > > >>>>>>>>>>>>>> building data generators and three example data
> > > generators:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from
> > > > >>>>>>>>>>>>>> BigPetStore)
> > > > >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions
> with
> > > > >>>>>>>>>>>>>> booths
> > > > >>>>>>>> on a
> > > > >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall
> > > > >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> > > > >>>>>>>> (temperature, wind
> > > > >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.
> (From a
> > > > >>>>>>>>>>>>>> model
> > > > >>>>>>>> trained on
> > > > >>>>>>>>>>>>>> NOAA historical weather data)
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> We believe that creating a common set of libraries
> will
> > > > >>>>>>>>>>>>>> have
> > > > >>>>>>>> several
> > > > >>>>>>>>>>>>>> benefits including:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> * Easier for others to build their own data generators
> > > > >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain
> > > > >>>>>>>>>>>>>> * Share improvements across the data generators
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> More details on the libraries are below.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> BigPetStore will be continue to focus on building  and
> > > > >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data
> > > > >> Generators.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to
> BigTop
> > > > >>>>>>>>>>>>>> for tools
> > > > >>>>>>>> for
> > > > >>>>>>>>>>>>>> building better, more comprehensive blueprints.  We
> want
> > > to
> > > > >>>>>>>> support these
> > > > >>>>>>>>>>>>>> efforts through data generators and the initial set of
> > > > >>>>>>>>>>>>>> blueprint
> > > > >>>>>>>> we've been
> > > > >>>>>>>>>>>>>> building.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> If the community is generally in support of this, I
> can
> > > > >>>>>>>>>>>>>> create a
> > > > >>>>>>>> top-level
> > > > >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data
> > > > >>>>>>>>>>>>>> generators and libraries in there.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Thanks!
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> RJ
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> -------
> > > > >>>>>>>>>>>>>> Library details:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> So far, I've extracted the following common libraries:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various
> > > > >>>>>>>>>>>>>> samplers
> > > > >>>>>>>>>>>>>> * Name generator -- data set and samplers for
> generating
> > > > >>>>>>>>>>>>>> names
> > > > >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip
> > > > >>>>>>>>>>>>>> codes,
> > > > >>>>>>>> their
> > > > >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and
> population
> > > > >>>>>>>>>>>>>> sizes
> > > > >>>>>>>>>>>>>> * Product generator -- library for enumerating
> products
> > > > >>>>>>>>>>>>>> from a specification file.  Comes with default
> > > > >>>>>>>>>>>>>> specifications for
> > > > >>>>>>>> BigPetStore
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I also expect that I'll add libraries for:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room
> > > > >>>>>>>>>>>>>> * Latent factor model generation -- generate latent
> > > > >>>>>>>>>>>>>> factors and customer weights to create something like
> > > > >>> MovieLens data.
> > > > >>>>>>>>>>>>>> Used in
> > > > >>>>>>>> Bazaar
> > > > >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore
> for
> > > > >>>>>>>>>>>>>> customer
> > > > >>>>>>>> item
> > > > >>>>>>>>>>>>>> preferences
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore
> data
> > > > >>>>>>>>>>>>>> generator
> > > > >>>>>>>> but the
> > > > >>>>>>>>>>>>>> other generators have been refactored to be based off
> the
> > > > >>>>>>>>>>>>>> standard
> > > > >>>>>>>> set of
> > > > >>>>>>>>>>>>>> libraries.
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >>>> jay vyas
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> > >
> > >
> >
> >
> > --
> > jay vyas
>

Reply via email to