The BigPetStore, Bazaar, and weather data generators have single-threaded
command-line interfaces.  We could do the same with the smaller generators
(names, locations, etc.) if there is interest.

On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]>
wrote:

> Nate: Good idea to abstract the interface one level higher....
>
> How about a docker run command ? That is probably the easiest way for
> Linux folks to run one off Java apps nowadays.
>
> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> data-dir --etc  foo --etc bar
>
> I'm happy to curate such a docker image, I already am doing something like
> this in kube for bigtop-transaction-queue, which continuously pumps data
> generator outputs into a REST endpoint or file
> Queue... So it could be extended to support other generators.
>
>
> > om> <[email protected]> wrote:
> >
> > Could picture at some point supporting something like this for non-jvm
> folk just looking for test/demo data:
> >
> > apt-get install bigtop-data-gen
> > ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir
> --etc  foo --etc bar
> >
> >
> >
> > -----Original Message-----
> > From: jay vyas [mailto:[email protected]]
> > Sent: Sunday, August 30, 2015 5:11 PM
> > To: [email protected]
> > Subject: Re: Proposal for "BigTop Data Generators"
> >
> > Hola nate.  Well, here are the Use cases I know of that I have used the
> data generators for.
> >
> > Dockerfile:
> >
> > (1) for testing kubernetes.  For this, I just use transaction-queue
> docker file.
> > (2) for testing GlusterFS small file workloads, maybe with other
> analytics tools...
> >
> > Maven repo
> >
> > (3) Java maprduce/ignite/spark applications, which can just add a mvn
> repo when compiling.  Java developers never add jars through RPM repos.
> >
> > RPM/DEB packages:
> >
> > I could see people using an RPM/DEB data generator, and I'm not against
> it.  But I simply don't know of any real world projects which *currently*
> need RPM/Deb packages, which is why I haven't bothered to propose it as a
> requirement.  Nevertheless linux packages are always a welcome addition if
> someone wants to create em !
> >
> >
> >
> >
> >> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
> >>
> >> Would container be in addition to deb/rpm, or instead of?  If latter
> >> can we do deb/rpm as base then have container either created from them
> >> or directly from artifacts?
> >>
> >> On test usage side, seems could probably break up tests into
> >> base/required and then optional/add-on tests/test-suites.  Think
> >> remember seeing mention of certain tests that are failing at times on
> >> certain component(s) anyways in the core builds but don’t mean that
> >> the build is broken, so would make sense to have some clean up around
> those anyways.
> >>
> >> -----Original Message-----
> >> From: RJ Nowling [mailto:[email protected]]
> >> Sent: Sunday, August 30, 2015 1:11 PM
> >> To: [email protected]
> >> Subject: Re: Proposal for "BigTop Data Generators"
> >>
> >> I agree with the above. :)
> >>
> >> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
> >> <[email protected]>
> >> wrote:
> >>
> >>> Hi RJ.
> >>>
> >>> Maven repositories and docker containers for the transaction queue
> >>> are good enough IMO.  That will give people a way to compose them in
> >>> different idioms (one for Java folks, another for broader Linux
> >>> audience
> >> ).
> >>>
> >>> I think the lib designs are fairly intuitive.  I would say that we
> >>> should constrain them all to being written in Java or Groovy to keep
> >>> the bigtop theme of "JVM for everything" :).
> >>>
> >>> Any particular questions you have around technical design can be
> >>> followed in a JIRA or else maybe a Readme spec that goes in a  top
> >>> level of the data-generators dir...
> >>>
> >>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote:
> >>>>
> >>>> I'd like to keep this conversation going.
> >>>>
> >>>> So here are a few discussion points:
> >>>>
> >>>> 1. How do we want to make the data generators available?  Maven?
> >>>> RPMs
> >>> and
> >>>> Debs?
> >>>>
> >>>> For now, I'm using a gradle multi-project build to easily build
> >>>> and
> >>> install
> >>>> the BPS data generators and its libraries into a local maven repo.
> >>>> This makes development easy.  Eventually, I would like to post
> >>>> binaries
> >>> through
> >>>> Maven for easy integration by users.  RPMs / Debs could be
> >>>> interesting since I use a pattern where the data generators are
> >>>> libraries (to support application integration / parallelization by
> >>>> the host framework) but also provide CLI drivers for local testing.
> >>>>
> >>>> 2.  The idea of using the data generators as part of the smoke
> >>>> tests came up.  Since there is concern about making the data
> >>>> generators required, we could offer the blueprints (BigPetStore)
> >>>> as optional smoke tests.  Would that be a good compromise?
> >>>>
> >>>> 3.  How will they be maintained?
> >>>>
> >>>> I'll certainly add myself to the maintainers list and will be
> >>>> taking responsibility.  I'm happy to have others help as well if
> >>>> anyone wants to
> >>>> -- if not, that's cool, too.
> >>>>
> >>>> 4. Is anyone interested at all in discussing library APIs and designs?
> >>>> What about internal interfaces and such?
> >>>>
> >>>>
> >>>> My plan was to add at least one more data generator (weather
> >>>> simulator)
> >>> to
> >>>> bigtop-data-generators in the short term.  However, given the
> >>>> concerns raised by Cos (more discussion needed) and Olaf (don't
> >>>> want to force data generators on unsuspecting users ;) ), I would
> >>>> like to reach some
> >>> consensus
> >>>> on what people are concerned about and solutions.
> >>>>
> >>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
> >>>> <[email protected]>
> >>> wrote:
> >>>>
> >>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ
> >>> created,
> >>>>> so
> >>>>> we have a way to connect one to another ;)
> >>>>>
> >>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am not confident that moving important design discussions with
> >>>>>> impact
> >>>>> to
> >>>>>> the whole project to jira is a good idea.
> >>>>>>
> >>>>>> In the current JIRA Traffic storm it is not easy to identify and
> >>>>>> follow
> >>>>> important tickets.
> >>>>>>
> >>>>>> Please keep discussions on the list or at least, please state on
> >>>>>> this
> >>>>> list which Ticket to follow ...
> >>>>>>
> >>>>>> Olaf
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <[email protected]
> >:
> >>>>>>>
> >>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Nive to have data generators in Bigtop.
> >>>>>>>>
> >>>>>>>> But please do not include it in bigtop_utils, since this
> >>>>>>>> package is mandatory. Not everyone needs a data generator .
> >>>>>>>
> >>>>>>> Yup. And let's move further design discussion to the JIRA!
> >>>>>>>
> >>>>>>>> Olaf
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> >>> [email protected]
> >>>>>> :
> >>>>>>>>>
> >>>>>>>>> Publishing the jar to bigtops maven is probably a good first
> >>>>>>>>> step
> >>>>> ,Then apps can just include it as needed...?.
> >>>>>>>>>
> >>>>>>>>> I'm not against packaging if someone wants packages for this.
> >>>>>>>>> Maybe
> >>>>> even include it in bigtop util ?
> >>>>>>>>>
> >>>>>>>>> Let's move to jira,
> >>>>>>>>>
> >>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
> >>>>>>>>>> <[email protected]>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> It is pretty cool indeed!
> >>>>>>>>>>
> >>>>>>>>>> I wonder how it needs to be structured to be:
> >>>>>>>>>> - easy to access/use from other components wherever it is
> >>>>>>>>>> needed
> >>>>>>>>>> - doesn't interfere with the rest of the stack
> >>>>>>>>>>
> >>>>>>>>>> I guess one possible way would be to implement the generator
> >>>>>>>>>> as a
> >>>>> set of maven
> >>>>>>>>>> artifacts, that could be installed/consumed transparently by
> >>>>>>>>>> just
> >>>>> declaring a
> >>>>>>>>>> dependency e.g as proposed via top-level component.
> >>>>>>>>>>
> >>>>>>>>>> Another way is to have a new package like we do for
> >>>>>>>>>> bigtop-utils
> >>>>> and such.
> >>>>>>>>>>
> >>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we
> >>>>> continue on the
> >>>>>>>>>> dev@ ??
> >>>>>>>>>>
> >>>>>>>>>> Cos
> >>>>>>>>>>
> >>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> >>>>>>>>>>> Hi BigTop,
> >>>>>>>>>>>
> >>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose
> >>>>>>>>>>> a new
> >>>>> component
> >>>>>>>>>>> for BigTop: BigTop Data Generators.
> >>>>>>>>>>>
> >>>>>>>>>>> BigTop Data Generators would consist of a common set of
> >>>>>>>>>>> libraries
> >>>>> for
> >>>>>>>>>>> building data generators and three example data generators:
> >>>>>>>>>>>
> >>>>>>>>>>> * BigPetStore transaction generator (moved from
> >>>>>>>>>>> BigPetStore)
> >>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with
> >>>>>>>>>>> booths
> >>>>> on a
> >>>>>>>>>>> showroom floor, at a conference, or at a mall
> >>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> >>>>> (temperature, wind
> >>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a
> >>>>>>>>>>> model
> >>>>> trained on
> >>>>>>>>>>> NOAA historical weather data)
> >>>>>>>>>>>
> >>>>>>>>>>> We believe that creating a common set of libraries will
> >>>>>>>>>>> have
> >>>>> several
> >>>>>>>>>>> benefits including:
> >>>>>>>>>>>
> >>>>>>>>>>> * Easier for others to build their own data generators
> >>>>>>>>>>> * Make data generators smaller and easier to maintain
> >>>>>>>>>>> * Share improvements across the data generators
> >>>>>>>>>>>
> >>>>>>>>>>> More details on the libraries are below.
> >>>>>>>>>>>
> >>>>>>>>>>> BigPetStore will be continue to focus on building  and
> >>>>>>>>>>> maintaining blueprints, powered by the BigTop Data Generators.
> >>>>>>>>>>>
> >>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop
> >>>>>>>>>>> for tools
> >>>>> for
> >>>>>>>>>>> building better, more comprehensive blueprints.  We want to
> >>>>> support these
> >>>>>>>>>>> efforts through data generators and the initial set of
> >>>>>>>>>>> blueprint
> >>>>> we've been
> >>>>>>>>>>> building.
> >>>>>>>>>>>
> >>>>>>>>>>> If the community is generally in support of this, I can
> >>>>>>>>>>> create a
> >>>>> top-level
> >>>>>>>>>>> "bigtop-data-generators" directory and put the data
> >>>>>>>>>>> generators and libraries in there.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> RJ
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -------
> >>>>>>>>>>> Library details:
> >>>>>>>>>>>
> >>>>>>>>>>> So far, I've extracted the following common libraries:
> >>>>>>>>>>>
> >>>>>>>>>>> * Samplers -- provides classes for PDFs and various
> >>>>>>>>>>> samplers
> >>>>>>>>>>> * Name generator -- data set and samplers for generating
> >>>>>>>>>>> names
> >>>>>>>>>>> * Location data set -- data set and classes for US zip
> >>>>>>>>>>> codes,
> >>>>> their
> >>>>>>>>>>> GPS coordinates, median house hold incomes, and population
> >>>>>>>>>>> sizes
> >>>>>>>>>>> * Product generator -- library for enumerating products
> >>>>>>>>>>> from a specification file.  Comes with default
> >>>>>>>>>>> specifications for
> >>>>> BigPetStore
> >>>>>>>>>>>
> >>>>>>>>>>> I also expect that I'll add libraries for:
> >>>>>>>>>>>
> >>>>>>>>>>>  * Particle simulation -- customer movement in a room
> >>>>>>>>>>>  * Latent factor model generation -- generate latent
> >>>>>>>>>>> factors and customer weights to create something like
> MovieLens data.
> >>>>>>>>>>> Used in
> >>>>> Bazaar
> >>>>>>>>>>> for booth preferences and potentially in BigPetStore for
> >>>>>>>>>>> customer
> >>>>> item
> >>>>>>>>>>> preferences
> >>>>>>>>>>>
> >>>>>>>>>>> Most of these libraries came out of the BigPetStore data
> >>>>>>>>>>> generator
> >>>>> but the
> >>>>>>>>>>> other generators have been refactored to be based off the
> >>>>>>>>>>> standard
> >>>>> set of
> >>>>>>>>>>> libraries.
> >
> >
> > --
> > jay vyas
> >
>

Reply via email to