I am very much like the shell script wrapper and docker image idea since
that way we can integrate it directly with bigtop provisioner which yield a
perfect ux for the whole things. I think its not too hard to do it both, we
just need to add a parameter to turn the script into daemon mode. I see
lots of image doing this way.

docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
data-dir --etc  foo --etc bar --daemon
2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道:

> The BigPetStore, Bazaar, and weather data generators have single-threaded
> command-line interfaces.  We could do the same with the smaller generators
> (names, locations, etc.) if there is interest.
>
> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]>
> wrote:
>
> > Nate: Good idea to abstract the interface one level higher....
> >
> > How about a docker run command ? That is probably the easiest way for
> > Linux folks to run one off Java apps nowadays.
> >
> > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> > data-dir --etc  foo --etc bar
> >
> > I'm happy to curate such a docker image, I already am doing something
> like
> > this in kube for bigtop-transaction-queue, which continuously pumps data
> > generator outputs into a REST endpoint or file
> > Queue... So it could be extended to support other generators.
> >
> >
> > > om> <[email protected]> wrote:
> > >
> > > Could picture at some point supporting something like this for non-jvm
> > folk just looking for test/demo data:
> > >
> > > apt-get install bigtop-data-gen
> > > ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir
> > --etc  foo --etc bar
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: jay vyas [mailto:[email protected]]
> > > Sent: Sunday, August 30, 2015 5:11 PM
> > > To: [email protected]
> > > Subject: Re: Proposal for "BigTop Data Generators"
> > >
> > > Hola nate.  Well, here are the Use cases I know of that I have used the
> > data generators for.
> > >
> > > Dockerfile:
> > >
> > > (1) for testing kubernetes.  For this, I just use transaction-queue
> > docker file.
> > > (2) for testing GlusterFS small file workloads, maybe with other
> > analytics tools...
> > >
> > > Maven repo
> > >
> > > (3) Java maprduce/ignite/spark applications, which can just add a mvn
> > repo when compiling.  Java developers never add jars through RPM repos.
> > >
> > > RPM/DEB packages:
> > >
> > > I could see people using an RPM/DEB data generator, and I'm not against
> > it.  But I simply don't know of any real world projects which *currently*
> > need RPM/Deb packages, which is why I haven't bothered to propose it as a
> > requirement.  Nevertheless linux packages are always a welcome addition
> if
> > someone wants to create em !
> > >
> > >
> > >
> > >
> > >> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
> > >>
> > >> Would container be in addition to deb/rpm, or instead of?  If latter
> > >> can we do deb/rpm as base then have container either created from them
> > >> or directly from artifacts?
> > >>
> > >> On test usage side, seems could probably break up tests into
> > >> base/required and then optional/add-on tests/test-suites.  Think
> > >> remember seeing mention of certain tests that are failing at times on
> > >> certain component(s) anyways in the core builds but don’t mean that
> > >> the build is broken, so would make sense to have some clean up around
> > those anyways.
> > >>
> > >> -----Original Message-----
> > >> From: RJ Nowling [mailto:[email protected]]
> > >> Sent: Sunday, August 30, 2015 1:11 PM
> > >> To: [email protected]
> > >> Subject: Re: Proposal for "BigTop Data Generators"
> > >>
> > >> I agree with the above. :)
> > >>
> > >> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
> > >> <[email protected]>
> > >> wrote:
> > >>
> > >>> Hi RJ.
> > >>>
> > >>> Maven repositories and docker containers for the transaction queue
> > >>> are good enough IMO.  That will give people a way to compose them in
> > >>> different idioms (one for Java folks, another for broader Linux
> > >>> audience
> > >> ).
> > >>>
> > >>> I think the lib designs are fairly intuitive.  I would say that we
> > >>> should constrain them all to being written in Java or Groovy to keep
> > >>> the bigtop theme of "JVM for everything" :).
> > >>>
> > >>> Any particular questions you have around technical design can be
> > >>> followed in a JIRA or else maybe a Readme spec that goes in a  top
> > >>> level of the data-generators dir...
> > >>>
> > >>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote:
> > >>>>
> > >>>> I'd like to keep this conversation going.
> > >>>>
> > >>>> So here are a few discussion points:
> > >>>>
> > >>>> 1. How do we want to make the data generators available?  Maven?
> > >>>> RPMs
> > >>> and
> > >>>> Debs?
> > >>>>
> > >>>> For now, I'm using a gradle multi-project build to easily build
> > >>>> and
> > >>> install
> > >>>> the BPS data generators and its libraries into a local maven repo.
> > >>>> This makes development easy.  Eventually, I would like to post
> > >>>> binaries
> > >>> through
> > >>>> Maven for easy integration by users.  RPMs / Debs could be
> > >>>> interesting since I use a pattern where the data generators are
> > >>>> libraries (to support application integration / parallelization by
> > >>>> the host framework) but also provide CLI drivers for local testing.
> > >>>>
> > >>>> 2.  The idea of using the data generators as part of the smoke
> > >>>> tests came up.  Since there is concern about making the data
> > >>>> generators required, we could offer the blueprints (BigPetStore)
> > >>>> as optional smoke tests.  Would that be a good compromise?
> > >>>>
> > >>>> 3.  How will they be maintained?
> > >>>>
> > >>>> I'll certainly add myself to the maintainers list and will be
> > >>>> taking responsibility.  I'm happy to have others help as well if
> > >>>> anyone wants to
> > >>>> -- if not, that's cool, too.
> > >>>>
> > >>>> 4. Is anyone interested at all in discussing library APIs and
> designs?
> > >>>> What about internal interfaces and such?
> > >>>>
> > >>>>
> > >>>> My plan was to add at least one more data generator (weather
> > >>>> simulator)
> > >>> to
> > >>>> bigtop-data-generators in the short term.  However, given the
> > >>>> concerns raised by Cos (more discussion needed) and Olaf (don't
> > >>>> want to force data generators on unsuspecting users ;) ), I would
> > >>>> like to reach some
> > >>> consensus
> > >>>> on what people are concerned about and solutions.
> > >>>>
> > >>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
> > >>>> <[email protected]>
> > >>> wrote:
> > >>>>
> > >>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ
> > >>> created,
> > >>>>> so
> > >>>>> we have a way to connect one to another ;)
> > >>>>>
> > >>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> I am not confident that moving important design discussions with
> > >>>>>> impact
> > >>>>> to
> > >>>>>> the whole project to jira is a good idea.
> > >>>>>>
> > >>>>>> In the current JIRA Traffic storm it is not easy to identify and
> > >>>>>> follow
> > >>>>> important tickets.
> > >>>>>>
> > >>>>>> Please keep discussions on the list or at least, please state on
> > >>>>>> this
> > >>>>> list which Ticket to follow ...
> > >>>>>>
> > >>>>>> Olaf
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <
> [email protected]
> > >:
> > >>>>>>>
> > >>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Nive to have data generators in Bigtop.
> > >>>>>>>>
> > >>>>>>>> But please do not include it in bigtop_utils, since this
> > >>>>>>>> package is mandatory. Not everyone needs a data generator .
> > >>>>>>>
> > >>>>>>> Yup. And let's move further design discussion to the JIRA!
> > >>>>>>>
> > >>>>>>>> Olaf
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> > >>> [email protected]
> > >>>>>> :
> > >>>>>>>>>
> > >>>>>>>>> Publishing the jar to bigtops maven is probably a good first
> > >>>>>>>>> step
> > >>>>> ,Then apps can just include it as needed...?.
> > >>>>>>>>>
> > >>>>>>>>> I'm not against packaging if someone wants packages for this.
> > >>>>>>>>> Maybe
> > >>>>> even include it in bigtop util ?
> > >>>>>>>>>
> > >>>>>>>>> Let's move to jira,
> > >>>>>>>>>
> > >>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
> > >>>>>>>>>> <[email protected]>
> > >>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> It is pretty cool indeed!
> > >>>>>>>>>>
> > >>>>>>>>>> I wonder how it needs to be structured to be:
> > >>>>>>>>>> - easy to access/use from other components wherever it is
> > >>>>>>>>>> needed
> > >>>>>>>>>> - doesn't interfere with the rest of the stack
> > >>>>>>>>>>
> > >>>>>>>>>> I guess one possible way would be to implement the generator
> > >>>>>>>>>> as a
> > >>>>> set of maven
> > >>>>>>>>>> artifacts, that could be installed/consumed transparently by
> > >>>>>>>>>> just
> > >>>>> declaring a
> > >>>>>>>>>> dependency e.g as proposed via top-level component.
> > >>>>>>>>>>
> > >>>>>>>>>> Another way is to have a new package like we do for
> > >>>>>>>>>> bigtop-utils
> > >>>>> and such.
> > >>>>>>>>>>
> > >>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we
> > >>>>> continue on the
> > >>>>>>>>>> dev@ ??
> > >>>>>>>>>>
> > >>>>>>>>>> Cos
> > >>>>>>>>>>
> > >>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> > >>>>>>>>>>> Hi BigTop,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose
> > >>>>>>>>>>> a new
> > >>>>> component
> > >>>>>>>>>>> for BigTop: BigTop Data Generators.
> > >>>>>>>>>>>
> > >>>>>>>>>>> BigTop Data Generators would consist of a common set of
> > >>>>>>>>>>> libraries
> > >>>>> for
> > >>>>>>>>>>> building data generators and three example data generators:
> > >>>>>>>>>>>
> > >>>>>>>>>>> * BigPetStore transaction generator (moved from
> > >>>>>>>>>>> BigPetStore)
> > >>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with
> > >>>>>>>>>>> booths
> > >>>>> on a
> > >>>>>>>>>>> showroom floor, at a conference, or at a mall
> > >>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> > >>>>> (temperature, wind
> > >>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a
> > >>>>>>>>>>> model
> > >>>>> trained on
> > >>>>>>>>>>> NOAA historical weather data)
> > >>>>>>>>>>>
> > >>>>>>>>>>> We believe that creating a common set of libraries will
> > >>>>>>>>>>> have
> > >>>>> several
> > >>>>>>>>>>> benefits including:
> > >>>>>>>>>>>
> > >>>>>>>>>>> * Easier for others to build their own data generators
> > >>>>>>>>>>> * Make data generators smaller and easier to maintain
> > >>>>>>>>>>> * Share improvements across the data generators
> > >>>>>>>>>>>
> > >>>>>>>>>>> More details on the libraries are below.
> > >>>>>>>>>>>
> > >>>>>>>>>>> BigPetStore will be continue to focus on building  and
> > >>>>>>>>>>> maintaining blueprints, powered by the BigTop Data
> Generators.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop
> > >>>>>>>>>>> for tools
> > >>>>> for
> > >>>>>>>>>>> building better, more comprehensive blueprints.  We want to
> > >>>>> support these
> > >>>>>>>>>>> efforts through data generators and the initial set of
> > >>>>>>>>>>> blueprint
> > >>>>> we've been
> > >>>>>>>>>>> building.
> > >>>>>>>>>>>
> > >>>>>>>>>>> If the community is generally in support of this, I can
> > >>>>>>>>>>> create a
> > >>>>> top-level
> > >>>>>>>>>>> "bigtop-data-generators" directory and put the data
> > >>>>>>>>>>> generators and libraries in there.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks!
> > >>>>>>>>>>>
> > >>>>>>>>>>> RJ
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -------
> > >>>>>>>>>>> Library details:
> > >>>>>>>>>>>
> > >>>>>>>>>>> So far, I've extracted the following common libraries:
> > >>>>>>>>>>>
> > >>>>>>>>>>> * Samplers -- provides classes for PDFs and various
> > >>>>>>>>>>> samplers
> > >>>>>>>>>>> * Name generator -- data set and samplers for generating
> > >>>>>>>>>>> names
> > >>>>>>>>>>> * Location data set -- data set and classes for US zip
> > >>>>>>>>>>> codes,
> > >>>>> their
> > >>>>>>>>>>> GPS coordinates, median house hold incomes, and population
> > >>>>>>>>>>> sizes
> > >>>>>>>>>>> * Product generator -- library for enumerating products
> > >>>>>>>>>>> from a specification file.  Comes with default
> > >>>>>>>>>>> specifications for
> > >>>>> BigPetStore
> > >>>>>>>>>>>
> > >>>>>>>>>>> I also expect that I'll add libraries for:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  * Particle simulation -- customer movement in a room
> > >>>>>>>>>>>  * Latent factor model generation -- generate latent
> > >>>>>>>>>>> factors and customer weights to create something like
> > MovieLens data.
> > >>>>>>>>>>> Used in
> > >>>>> Bazaar
> > >>>>>>>>>>> for booth preferences and potentially in BigPetStore for
> > >>>>>>>>>>> customer
> > >>>>> item
> > >>>>>>>>>>> preferences
> > >>>>>>>>>>>
> > >>>>>>>>>>> Most of these libraries came out of the BigPetStore data
> > >>>>>>>>>>> generator
> > >>>>> but the
> > >>>>>>>>>>> other generators have been refactored to be based off the
> > >>>>>>>>>>> standard
> > >>>>> set of
> > >>>>>>>>>>> libraries.
> > >
> > >
> > > --
> > > jay vyas
> > >
> >
>

Reply via email to