On Mon, Aug 31, 2015 at 08:01PM, jay vyas wrote:
> - I agree Gradle is better than "yet another script".
> 
> - And docker container, even if suboptimal, as thin wrapper to gradle
> container is all you need to deliver the data-generator on the masses so
> they can try it w/ zero startup cost.

That's true. But it didn't sound this way originally, hence I've asked.

> On Mon, Aug 31, 2015 at 7:36 PM, Konstantin Boudnik <[email protected]> wrote:
> 
> > Why do would we need yet another script (and potentially an extra readme to
> > explain its command options) when we have the gradle?
> >
> > Cos
> >
> > On Mon, Aug 31, 2015 at 10:51PM, Olaf Flebbe wrote:
> > > +1 to the CLI /shell script interface.
> > >
> > > If I can choose I like to have a apt-get install bigtop-datagenerator ,
> > running for instance
> > >
> > > bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels
> > simulationLength seed
> > >
> > > I can help out with packaging if needed.
> > >
> > > Why should we use the docker indirection for a plain CLI file ? Of
> > course, We can provide a trivial Dockerfile to create a container supplying
> > a JVM and running the CLI ... But I do not like to depend our services on
> > docker registry more than we do now.
> > >
> > > Olaf
> > >
> > >
> > >
> > > > Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>:
> > > >
> > > > I am very much like the shell script wrapper and docker image idea
> > since
> > > > that way we can integrate it directly with bigtop provisioner which
> > yield a
> > > > perfect ux for the whole things. I think its not too hard to do it
> > both, we
> > > > just need to add a parameter to turn the script into daemon mode. I see
> > > > lots of image doing this way.
> > > >
> > > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> > > > data-dir --etc  foo --etc bar --daemon
> > > > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道:
> > > >
> > > >> The BigPetStore, Bazaar, and weather data generators have
> > single-threaded
> > > >> command-line interfaces.  We could do the same with the smaller
> > generators
> > > >> (names, locations, etc.) if there is interest.
> > > >>
> > > >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <
> > [email protected]>
> > > >> wrote:
> > > >>
> > > >>> Nate: Good idea to abstract the interface one level higher....
> > > >>>
> > > >>> How about a docker run command ? That is probably the easiest way for
> > > >>> Linux folks to run one off Java apps nowadays.
> > > >>>
> > > >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB
> > --output
> > > >>> data-dir --etc  foo --etc bar
> > > >>>
> > > >>> I'm happy to curate such a docker image, I already am doing something
> > > >> like
> > > >>> this in kube for bigtop-transaction-queue, which continuously pumps
> > data
> > > >>> generator outputs into a REST endpoint or file
> > > >>> Queue... So it could be extended to support other generators.
> > > >>>
> > > >>>
> > > >>>> om> <[email protected]> wrote:
> > > >>>>
> > > >>>> Could picture at some point supporting something like this for
> > non-jvm
> > > >>> folk just looking for test/demo data:
> > > >>>>
> > > >>>> apt-get install bigtop-data-gen
> > > >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir
> > > >>> --etc  foo --etc bar
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> -----Original Message-----
> > > >>>> From: jay vyas [mailto:[email protected]]
> > > >>>> Sent: Sunday, August 30, 2015 5:11 PM
> > > >>>> To: [email protected]
> > > >>>> Subject: Re: Proposal for "BigTop Data Generators"
> > > >>>>
> > > >>>> Hola nate.  Well, here are the Use cases I know of that I have used
> > the
> > > >>> data generators for.
> > > >>>>
> > > >>>> Dockerfile:
> > > >>>>
> > > >>>> (1) for testing kubernetes.  For this, I just use transaction-queue
> > > >>> docker file.
> > > >>>> (2) for testing GlusterFS small file workloads, maybe with other
> > > >>> analytics tools...
> > > >>>>
> > > >>>> Maven repo
> > > >>>>
> > > >>>> (3) Java maprduce/ignite/spark applications, which can just add a
> > mvn
> > > >>> repo when compiling.  Java developers never add jars through RPM
> > repos.
> > > >>>>
> > > >>>> RPM/DEB packages:
> > > >>>>
> > > >>>> I could see people using an RPM/DEB data generator, and I'm not
> > against
> > > >>> it.  But I simply don't know of any real world projects which
> > *currently*
> > > >>> need RPM/Deb packages, which is why I haven't bothered to propose it
> > as a
> > > >>> requirement.  Nevertheless linux packages are always a welcome
> > addition
> > > >> if
> > > >>> someone wants to create em !
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
> > > >>>>>
> > > >>>>> Would container be in addition to deb/rpm, or instead of?  If
> > latter
> > > >>>>> can we do deb/rpm as base then have container either created from
> > them
> > > >>>>> or directly from artifacts?
> > > >>>>>
> > > >>>>> On test usage side, seems could probably break up tests into
> > > >>>>> base/required and then optional/add-on tests/test-suites.  Think
> > > >>>>> remember seeing mention of certain tests that are failing at times
> > on
> > > >>>>> certain component(s) anyways in the core builds but don’t mean that
> > > >>>>> the build is broken, so would make sense to have some clean up
> > around
> > > >>> those anyways.
> > > >>>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: RJ Nowling [mailto:[email protected]]
> > > >>>>> Sent: Sunday, August 30, 2015 1:11 PM
> > > >>>>> To: [email protected]
> > > >>>>> Subject: Re: Proposal for "BigTop Data Generators"
> > > >>>>>
> > > >>>>> I agree with the above. :)
> > > >>>>>
> > > >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
> > > >>>>> <[email protected]>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi RJ.
> > > >>>>>>
> > > >>>>>> Maven repositories and docker containers for the transaction queue
> > > >>>>>> are good enough IMO.  That will give people a way to compose them
> > in
> > > >>>>>> different idioms (one for Java folks, another for broader Linux
> > > >>>>>> audience
> > > >>>>> ).
> > > >>>>>>
> > > >>>>>> I think the lib designs are fairly intuitive.  I would say that we
> > > >>>>>> should constrain them all to being written in Java or Groovy to
> > keep
> > > >>>>>> the bigtop theme of "JVM for everything" :).
> > > >>>>>>
> > > >>>>>> Any particular questions you have around technical design can be
> > > >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a  top
> > > >>>>>> level of the data-generators dir...
> > > >>>>>>
> > > >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]>
> > wrote:
> > > >>>>>>>
> > > >>>>>>> I'd like to keep this conversation going.
> > > >>>>>>>
> > > >>>>>>> So here are a few discussion points:
> > > >>>>>>>
> > > >>>>>>> 1. How do we want to make the data generators available?  Maven?
> > > >>>>>>> RPMs
> > > >>>>>> and
> > > >>>>>>> Debs?
> > > >>>>>>>
> > > >>>>>>> For now, I'm using a gradle multi-project build to easily build
> > > >>>>>>> and
> > > >>>>>> install
> > > >>>>>>> the BPS data generators and its libraries into a local maven
> > repo.
> > > >>>>>>> This makes development easy.  Eventually, I would like to post
> > > >>>>>>> binaries
> > > >>>>>> through
> > > >>>>>>> Maven for easy integration by users.  RPMs / Debs could be
> > > >>>>>>> interesting since I use a pattern where the data generators are
> > > >>>>>>> libraries (to support application integration / parallelization
> > by
> > > >>>>>>> the host framework) but also provide CLI drivers for local
> > testing.
> > > >>>>>>>
> > > >>>>>>> 2.  The idea of using the data generators as part of the smoke
> > > >>>>>>> tests came up.  Since there is concern about making the data
> > > >>>>>>> generators required, we could offer the blueprints (BigPetStore)
> > > >>>>>>> as optional smoke tests.  Would that be a good compromise?
> > > >>>>>>>
> > > >>>>>>> 3.  How will they be maintained?
> > > >>>>>>>
> > > >>>>>>> I'll certainly add myself to the maintainers list and will be
> > > >>>>>>> taking responsibility.  I'm happy to have others help as well if
> > > >>>>>>> anyone wants to
> > > >>>>>>> -- if not, that's cool, too.
> > > >>>>>>>
> > > >>>>>>> 4. Is anyone interested at all in discussing library APIs and
> > > >> designs?
> > > >>>>>>> What about internal interfaces and such?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> My plan was to add at least one more data generator (weather
> > > >>>>>>> simulator)
> > > >>>>>> to
> > > >>>>>>> bigtop-data-generators in the short term.  However, given the
> > > >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't
> > > >>>>>>> want to force data generators on unsuspecting users ;) ), I would
> > > >>>>>>> like to reach some
> > > >>>>>> consensus
> > > >>>>>>> on what people are concerned about and solutions.
> > > >>>>>>>
> > > >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
> > > >>>>>>> <[email protected]>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ
> > > >>>>>> created,
> > > >>>>>>>> so
> > > >>>>>>>> we have a way to connect one to another ;)
> > > >>>>>>>>
> > > >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> I am not confident that moving important design discussions
> > with
> > > >>>>>>>>> impact
> > > >>>>>>>> to
> > > >>>>>>>>> the whole project to jira is a good idea.
> > > >>>>>>>>>
> > > >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify
> > and
> > > >>>>>>>>> follow
> > > >>>>>>>> important tickets.
> > > >>>>>>>>>
> > > >>>>>>>>> Please keep discussions on the list or at least, please state
> > on
> > > >>>>>>>>> this
> > > >>>>>>>> list which Ticket to follow ...
> > > >>>>>>>>>
> > > >>>>>>>>> Olaf
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <
> > > >> [email protected]
> > > >>>> :
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> > > >>>>>>>>>>> Hi,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Nive to have data generators in Bigtop.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> But please do not include it in bigtop_utils, since this
> > > >>>>>>>>>>> package is mandatory. Not everyone needs a data generator .
> > > >>>>>>>>>>
> > > >>>>>>>>>> Yup. And let's move further design discussion to the JIRA!
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Olaf
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> > > >>>>>> [email protected]
> > > >>>>>>>>> :
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first
> > > >>>>>>>>>>>> step
> > > >>>>>>>> ,Then apps can just include it as needed...?.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'm not against packaging if someone wants packages for
> > this.
> > > >>>>>>>>>>>> Maybe
> > > >>>>>>>> even include it in bigtop util ?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Let's move to jira,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
> > > >>>>>>>>>>>>> <[email protected]>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> It is pretty cool indeed!
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I wonder how it needs to be structured to be:
> > > >>>>>>>>>>>>> - easy to access/use from other components wherever it is
> > > >>>>>>>>>>>>> needed
> > > >>>>>>>>>>>>> - doesn't interfere with the rest of the stack
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I guess one possible way would be to implement the
> > generator
> > > >>>>>>>>>>>>> as a
> > > >>>>>>>> set of maven
> > > >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently
> > by
> > > >>>>>>>>>>>>> just
> > > >>>>>>>> declaring a
> > > >>>>>>>>>>>>> dependency e.g as proposed via top-level component.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Another way is to have a new package like we do for
> > > >>>>>>>>>>>>> bigtop-utils
> > > >>>>>>>> and such.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we
> > > >>>>>>>> continue on the
> > > >>>>>>>>>>>>> dev@ ??
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Cos
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> > > >>>>>>>>>>>>>> Hi BigTop,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to
> > propose
> > > >>>>>>>>>>>>>> a new
> > > >>>>>>>> component
> > > >>>>>>>>>>>>>> for BigTop: BigTop Data Generators.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of
> > > >>>>>>>>>>>>>> libraries
> > > >>>>>>>> for
> > > >>>>>>>>>>>>>> building data generators and three example data
> > generators:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from
> > > >>>>>>>>>>>>>> BigPetStore)
> > > >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with
> > > >>>>>>>>>>>>>> booths
> > > >>>>>>>> on a
> > > >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall
> > > >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> > > >>>>>>>> (temperature, wind
> > > >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a
> > > >>>>>>>>>>>>>> model
> > > >>>>>>>> trained on
> > > >>>>>>>>>>>>>> NOAA historical weather data)
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> We believe that creating a common set of libraries will
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>> several
> > > >>>>>>>>>>>>>> benefits including:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> * Easier for others to build their own data generators
> > > >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain
> > > >>>>>>>>>>>>>> * Share improvements across the data generators
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> More details on the libraries are below.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> BigPetStore will be continue to focus on building  and
> > > >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data
> > > >> Generators.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop
> > > >>>>>>>>>>>>>> for tools
> > > >>>>>>>> for
> > > >>>>>>>>>>>>>> building better, more comprehensive blueprints.  We want
> > to
> > > >>>>>>>> support these
> > > >>>>>>>>>>>>>> efforts through data generators and the initial set of
> > > >>>>>>>>>>>>>> blueprint
> > > >>>>>>>> we've been
> > > >>>>>>>>>>>>>> building.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> If the community is generally in support of this, I can
> > > >>>>>>>>>>>>>> create a
> > > >>>>>>>> top-level
> > > >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data
> > > >>>>>>>>>>>>>> generators and libraries in there.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks!
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> RJ
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> -------
> > > >>>>>>>>>>>>>> Library details:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> So far, I've extracted the following common libraries:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various
> > > >>>>>>>>>>>>>> samplers
> > > >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating
> > > >>>>>>>>>>>>>> names
> > > >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip
> > > >>>>>>>>>>>>>> codes,
> > > >>>>>>>> their
> > > >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population
> > > >>>>>>>>>>>>>> sizes
> > > >>>>>>>>>>>>>> * Product generator -- library for enumerating products
> > > >>>>>>>>>>>>>> from a specification file.  Comes with default
> > > >>>>>>>>>>>>>> specifications for
> > > >>>>>>>> BigPetStore
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I also expect that I'll add libraries for:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room
> > > >>>>>>>>>>>>>> * Latent factor model generation -- generate latent
> > > >>>>>>>>>>>>>> factors and customer weights to create something like
> > > >>> MovieLens data.
> > > >>>>>>>>>>>>>> Used in
> > > >>>>>>>> Bazaar
> > > >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for
> > > >>>>>>>>>>>>>> customer
> > > >>>>>>>> item
> > > >>>>>>>>>>>>>> preferences
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data
> > > >>>>>>>>>>>>>> generator
> > > >>>>>>>> but the
> > > >>>>>>>>>>>>>> other generators have been refactored to be based off the
> > > >>>>>>>>>>>>>> standard
> > > >>>>>>>> set of
> > > >>>>>>>>>>>>>> libraries.
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> jay vyas
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> >
> >
> 
> 
> -- 
> jay vyas

Reply via email to