Or probably it's way simpler to juat have one script since the data
generator runs once and it can be gone.
2015年8月31日 下午10:45於 "Jay Vyas" <[email protected]>寫道:

> Rj can we abstract the command line so that we have "one cli to rule them
> all" into an interface?
>
>
> > On Aug 31, 2015, at 10:40 AM, Evans Ye <[email protected]> wrote:
> >
> > I am very much like the shell script wrapper and docker image idea since
> > that way we can integrate it directly with bigtop provisioner which
> yield a
> > perfect ux for the whole things. I think its not too hard to do it both,
> we
> > just need to add a parameter to turn the script into daemon mode. I see
> > lots of image doing this way.
> >
> > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> > data-dir --etc  foo --etc bar --daemon
> > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道:
> >
> >> The BigPetStore, Bazaar, and weather data generators have
> single-threaded
> >> command-line interfaces.  We could do the same with the smaller
> generators
> >> (names, locations, etc.) if there is interest.
> >>
> >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <[email protected]>
> >> wrote:
> >>
> >>> Nate: Good idea to abstract the interface one level higher....
> >>>
> >>> How about a docker run command ? That is probably the easiest way for
> >>> Linux folks to run one off Java apps nowadays.
> >>>
> >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> >>> data-dir --etc  foo --etc bar
> >>>
> >>> I'm happy to curate such a docker image, I already am doing something
> >> like
> >>> this in kube for bigtop-transaction-queue, which continuously pumps
> data
> >>> generator outputs into a REST endpoint or file
> >>> Queue... So it could be extended to support other generators.
> >>>
> >>>
> >>>> om> <[email protected]> wrote:
> >>>>
> >>>> Could picture at some point supporting something like this for non-jvm
> >>> folk just looking for test/demo data:
> >>>>
> >>>> apt-get install bigtop-data-gen
> >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir
> >>> --etc  foo --etc bar
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: jay vyas [mailto:[email protected]]
> >>>> Sent: Sunday, August 30, 2015 5:11 PM
> >>>> To: [email protected]
> >>>> Subject: Re: Proposal for "BigTop Data Generators"
> >>>>
> >>>> Hola nate.  Well, here are the Use cases I know of that I have used
> the
> >>> data generators for.
> >>>>
> >>>> Dockerfile:
> >>>>
> >>>> (1) for testing kubernetes.  For this, I just use transaction-queue
> >>> docker file.
> >>>> (2) for testing GlusterFS small file workloads, maybe with other
> >>> analytics tools...
> >>>>
> >>>> Maven repo
> >>>>
> >>>> (3) Java maprduce/ignite/spark applications, which can just add a mvn
> >>> repo when compiling.  Java developers never add jars through RPM repos.
> >>>>
> >>>> RPM/DEB packages:
> >>>>
> >>>> I could see people using an RPM/DEB data generator, and I'm not
> against
> >>> it.  But I simply don't know of any real world projects which
> *currently*
> >>> need RPM/Deb packages, which is why I haven't bothered to propose it
> as a
> >>> requirement.  Nevertheless linux packages are always a welcome addition
> >> if
> >>> someone wants to create em !
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
> >>>>>
> >>>>> Would container be in addition to deb/rpm, or instead of?  If latter
> >>>>> can we do deb/rpm as base then have container either created from
> them
> >>>>> or directly from artifacts?
> >>>>>
> >>>>> On test usage side, seems could probably break up tests into
> >>>>> base/required and then optional/add-on tests/test-suites.  Think
> >>>>> remember seeing mention of certain tests that are failing at times on
> >>>>> certain component(s) anyways in the core builds but don’t mean that
> >>>>> the build is broken, so would make sense to have some clean up around
> >>> those anyways.
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: RJ Nowling [mailto:[email protected]]
> >>>>> Sent: Sunday, August 30, 2015 1:11 PM
> >>>>> To: [email protected]
> >>>>> Subject: Re: Proposal for "BigTop Data Generators"
> >>>>>
> >>>>> I agree with the above. :)
> >>>>>
> >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
> >>>>> <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi RJ.
> >>>>>>
> >>>>>> Maven repositories and docker containers for the transaction queue
> >>>>>> are good enough IMO.  That will give people a way to compose them in
> >>>>>> different idioms (one for Java folks, another for broader Linux
> >>>>>> audience
> >>>>> ).
> >>>>>>
> >>>>>> I think the lib designs are fairly intuitive.  I would say that we
> >>>>>> should constrain them all to being written in Java or Groovy to keep
> >>>>>> the bigtop theme of "JVM for everything" :).
> >>>>>>
> >>>>>> Any particular questions you have around technical design can be
> >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a  top
> >>>>>> level of the data-generators dir...
> >>>>>>
> >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]>
> wrote:
> >>>>>>>
> >>>>>>> I'd like to keep this conversation going.
> >>>>>>>
> >>>>>>> So here are a few discussion points:
> >>>>>>>
> >>>>>>> 1. How do we want to make the data generators available?  Maven?
> >>>>>>> RPMs
> >>>>>> and
> >>>>>>> Debs?
> >>>>>>>
> >>>>>>> For now, I'm using a gradle multi-project build to easily build
> >>>>>>> and
> >>>>>> install
> >>>>>>> the BPS data generators and its libraries into a local maven repo.
> >>>>>>> This makes development easy.  Eventually, I would like to post
> >>>>>>> binaries
> >>>>>> through
> >>>>>>> Maven for easy integration by users.  RPMs / Debs could be
> >>>>>>> interesting since I use a pattern where the data generators are
> >>>>>>> libraries (to support application integration / parallelization by
> >>>>>>> the host framework) but also provide CLI drivers for local testing.
> >>>>>>>
> >>>>>>> 2.  The idea of using the data generators as part of the smoke
> >>>>>>> tests came up.  Since there is concern about making the data
> >>>>>>> generators required, we could offer the blueprints (BigPetStore)
> >>>>>>> as optional smoke tests.  Would that be a good compromise?
> >>>>>>>
> >>>>>>> 3.  How will they be maintained?
> >>>>>>>
> >>>>>>> I'll certainly add myself to the maintainers list and will be
> >>>>>>> taking responsibility.  I'm happy to have others help as well if
> >>>>>>> anyone wants to
> >>>>>>> -- if not, that's cool, too.
> >>>>>>>
> >>>>>>> 4. Is anyone interested at all in discussing library APIs and
> >> designs?
> >>>>>>> What about internal interfaces and such?
> >>>>>>>
> >>>>>>>
> >>>>>>> My plan was to add at least one more data generator (weather
> >>>>>>> simulator)
> >>>>>> to
> >>>>>>> bigtop-data-generators in the short term.  However, given the
> >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't
> >>>>>>> want to force data generators on unsuspecting users ;) ), I would
> >>>>>>> like to reach some
> >>>>>> consensus
> >>>>>>> on what people are concerned about and solutions.
> >>>>>>>
> >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
> >>>>>>> <[email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ
> >>>>>> created,
> >>>>>>>> so
> >>>>>>>> we have a way to connect one to another ;)
> >>>>>>>>
> >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I am not confident that moving important design discussions with
> >>>>>>>>> impact
> >>>>>>>> to
> >>>>>>>>> the whole project to jira is a good idea.
> >>>>>>>>>
> >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify and
> >>>>>>>>> follow
> >>>>>>>> important tickets.
> >>>>>>>>>
> >>>>>>>>> Please keep discussions on the list or at least, please state on
> >>>>>>>>> this
> >>>>>>>> list which Ticket to follow ...
> >>>>>>>>>
> >>>>>>>>> Olaf
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <
> >> [email protected]
> >>>> :
> >>>>>>>>>>
> >>>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> Nive to have data generators in Bigtop.
> >>>>>>>>>>>
> >>>>>>>>>>> But please do not include it in bigtop_utils, since this
> >>>>>>>>>>> package is mandatory. Not everyone needs a data generator .
> >>>>>>>>>>
> >>>>>>>>>> Yup. And let's move further design discussion to the JIRA!
> >>>>>>>>>>
> >>>>>>>>>>> Olaf
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> >>>>>> [email protected]
> >>>>>>>>> :
> >>>>>>>>>>>>
> >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first
> >>>>>>>>>>>> step
> >>>>>>>> ,Then apps can just include it as needed...?.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm not against packaging if someone wants packages for this.
> >>>>>>>>>>>> Maybe
> >>>>>>>> even include it in bigtop util ?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Let's move to jira,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
> >>>>>>>>>>>>> <[email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It is pretty cool indeed!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I wonder how it needs to be structured to be:
> >>>>>>>>>>>>> - easy to access/use from other components wherever it is
> >>>>>>>>>>>>> needed
> >>>>>>>>>>>>> - doesn't interfere with the rest of the stack
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I guess one possible way would be to implement the generator
> >>>>>>>>>>>>> as a
> >>>>>>>> set of maven
> >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently by
> >>>>>>>>>>>>> just
> >>>>>>>> declaring a
> >>>>>>>>>>>>> dependency e.g as proposed via top-level component.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Another way is to have a new package like we do for
> >>>>>>>>>>>>> bigtop-utils
> >>>>>>>> and such.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we
> >>>>>>>> continue on the
> >>>>>>>>>>>>> dev@ ??
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cos
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> >>>>>>>>>>>>>> Hi BigTop,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to propose
> >>>>>>>>>>>>>> a new
> >>>>>>>> component
> >>>>>>>>>>>>>> for BigTop: BigTop Data Generators.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of
> >>>>>>>>>>>>>> libraries
> >>>>>>>> for
> >>>>>>>>>>>>>> building data generators and three example data generators:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from
> >>>>>>>>>>>>>> BigPetStore)
> >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with
> >>>>>>>>>>>>>> booths
> >>>>>>>> on a
> >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall
> >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> >>>>>>>> (temperature, wind
> >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a
> >>>>>>>>>>>>>> model
> >>>>>>>> trained on
> >>>>>>>>>>>>>> NOAA historical weather data)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We believe that creating a common set of libraries will
> >>>>>>>>>>>>>> have
> >>>>>>>> several
> >>>>>>>>>>>>>> benefits including:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> * Easier for others to build their own data generators
> >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain
> >>>>>>>>>>>>>> * Share improvements across the data generators
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> More details on the libraries are below.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> BigPetStore will be continue to focus on building  and
> >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data
> >> Generators.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop
> >>>>>>>>>>>>>> for tools
> >>>>>>>> for
> >>>>>>>>>>>>>> building better, more comprehensive blueprints.  We want to
> >>>>>>>> support these
> >>>>>>>>>>>>>> efforts through data generators and the initial set of
> >>>>>>>>>>>>>> blueprint
> >>>>>>>> we've been
> >>>>>>>>>>>>>> building.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If the community is generally in support of this, I can
> >>>>>>>>>>>>>> create a
> >>>>>>>> top-level
> >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data
> >>>>>>>>>>>>>> generators and libraries in there.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> RJ
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -------
> >>>>>>>>>>>>>> Library details:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So far, I've extracted the following common libraries:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various
> >>>>>>>>>>>>>> samplers
> >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating
> >>>>>>>>>>>>>> names
> >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip
> >>>>>>>>>>>>>> codes,
> >>>>>>>> their
> >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population
> >>>>>>>>>>>>>> sizes
> >>>>>>>>>>>>>> * Product generator -- library for enumerating products
> >>>>>>>>>>>>>> from a specification file.  Comes with default
> >>>>>>>>>>>>>> specifications for
> >>>>>>>> BigPetStore
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also expect that I'll add libraries for:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room
> >>>>>>>>>>>>>> * Latent factor model generation -- generate latent
> >>>>>>>>>>>>>> factors and customer weights to create something like
> >>> MovieLens data.
> >>>>>>>>>>>>>> Used in
> >>>>>>>> Bazaar
> >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for
> >>>>>>>>>>>>>> customer
> >>>>>>>> item
> >>>>>>>>>>>>>> preferences
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data
> >>>>>>>>>>>>>> generator
> >>>>>>>> but the
> >>>>>>>>>>>>>> other generators have been refactored to be based off the
> >>>>>>>>>>>>>> standard
> >>>>>>>> set of
> >>>>>>>>>>>>>> libraries.
> >>>>
> >>>>
> >>>> --
> >>>> jay vyas
> >>
>

Reply via email to