Could picture at some point supporting something like this for non-jvm folk 
just looking for test/demo data:

apt-get install bigtop-data-gen
~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir --etc  foo 
--etc bar



-----Original Message-----
From: jay vyas [mailto:[email protected]] 
Sent: Sunday, August 30, 2015 5:11 PM
To: [email protected]
Subject: Re: Proposal for "BigTop Data Generators"

Hola nate.  Well, here are the Use cases I know of that I have used the data 
generators for.

Dockerfile:

(1) for testing kubernetes.  For this, I just use transaction-queue docker file.
(2) for testing GlusterFS small file workloads, maybe with other analytics 
tools...

Maven repo

(3) Java maprduce/ignite/spark applications, which can just add a mvn repo when 
compiling.  Java developers never add jars through RPM repos.

RPM/DEB packages:

I could see people using an RPM/DEB data generator, and I'm not against it.  
But I simply don't know of any real world projects which *currently* need 
RPM/Deb packages, which is why I haven't bothered to propose it as a 
requirement.  Nevertheless linux packages are always a welcome addition if  
someone wants to create em !




On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:

> Would container be in addition to deb/rpm, or instead of?  If latter 
> can we do deb/rpm as base then have container either created from them 
> or directly from artifacts?
>
> On test usage side, seems could probably break up tests into 
> base/required and then optional/add-on tests/test-suites.  Think 
> remember seeing mention of certain tests that are failing at times on 
> certain component(s) anyways in the core builds but don’t mean that 
> the build is broken, so would make sense to have some clean up around those 
> anyways.
>
> -----Original Message-----
> From: RJ Nowling [mailto:[email protected]]
> Sent: Sunday, August 30, 2015 1:11 PM
> To: [email protected]
> Subject: Re: Proposal for "BigTop Data Generators"
>
> I agree with the above. :)
>
> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas 
> <[email protected]>
> wrote:
>
> > Hi RJ.
> >
> > Maven repositories and docker containers for the transaction queue 
> > are good enough IMO.  That will give people a way to compose them in 
> > different idioms (one for Java folks, another for broader Linux 
> > audience
> ).
> >
> > I think the lib designs are fairly intuitive.  I would say that we 
> > should constrain them all to being written in Java or Groovy to keep 
> > the bigtop theme of "JVM for everything" :).
> >
> > Any particular questions you have around technical design can be 
> > followed in a JIRA or else maybe a Readme spec that goes in a  top 
> > level of the data-generators dir...
> >
> > > On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote:
> > >
> > > I'd like to keep this conversation going.
> > >
> > > So here are a few discussion points:
> > >
> > > 1. How do we want to make the data generators available?  Maven?
> > > RPMs
> > and
> > > Debs?
> > >
> > > For now, I'm using a gradle multi-project build to easily build 
> > > and
> > install
> > > the BPS data generators and its libraries into a local maven repo.
> > > This makes development easy.  Eventually, I would like to post 
> > > binaries
> > through
> > > Maven for easy integration by users.  RPMs / Debs could be 
> > > interesting since I use a pattern where the data generators are 
> > > libraries (to support application integration / parallelization by 
> > > the host framework) but also provide CLI drivers for local testing.
> > >
> > > 2.  The idea of using the data generators as part of the smoke 
> > > tests came up.  Since there is concern about making the data 
> > > generators required, we could offer the blueprints (BigPetStore) 
> > > as optional smoke tests.  Would that be a good compromise?
> > >
> > > 3.  How will they be maintained?
> > >
> > > I'll certainly add myself to the maintainers list and will be 
> > > taking responsibility.  I'm happy to have others help as well if 
> > > anyone wants to
> > > -- if not, that's cool, too.
> > >
> > > 4. Is anyone interested at all in discussing library APIs and designs?
> > > What about internal interfaces and such?
> > >
> > >
> > > My plan was to add at least one more data generator (weather
> > > simulator)
> > to
> > > bigtop-data-generators in the short term.  However, given the 
> > > concerns raised by Cos (more discussion needed) and Olaf (don't 
> > > want to force data generators on unsuspecting users ;) ), I would 
> > > like to reach some
> > consensus
> > > on what people are concerned about and solutions.
> > >
> > > On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik 
> > > <[email protected]>
> > wrote:
> > >
> > >> Fine by me. I have linked this thread to the JIRA ticket that RJ
> > created,
> > >> so
> > >> we have a way to connect one to another ;)
> > >>
> > >>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> > >>> Hi,
> > >>>
> > >>> I am not confident that moving important design discussions with 
> > >>> impact
> > >> to
> > >>> the whole project to jira is a good idea.
> > >>>
> > >>> In the current JIRA Traffic storm it is not easy to identify and 
> > >>> follow
> > >> important tickets.
> > >>>
> > >>> Please keep discussions on the list or at least, please state on 
> > >>> this
> > >> list which Ticket to follow ...
> > >>>
> > >>> Olaf
> > >>>
> > >>>
> > >>>
> > >>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <[email protected]>:
> > >>>>
> > >>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> Nive to have data generators in Bigtop.
> > >>>>>
> > >>>>> But please do not include it in bigtop_utils, since this 
> > >>>>> package is mandatory. Not everyone needs a data generator .
> > >>>>
> > >>>> Yup. And let's move further design discussion to the JIRA!
> > >>>>
> > >>>>> Olaf
> > >>>>>
> > >>>>>
> > >>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> > [email protected]
> > >>> :
> > >>>>>>
> > >>>>>> Publishing the jar to bigtops maven is probably a good first 
> > >>>>>> step
> > >> ,Then apps can just include it as needed...?.
> > >>>>>>
> > >>>>>> I'm not against packaging if someone wants packages for this.
> > >>>>>> Maybe
> > >> even include it in bigtop util ?
> > >>>>>>
> > >>>>>> Let's move to jira,
> > >>>>>>
> > >>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik 
> > >>>>>>> <[email protected]>
> > >> wrote:
> > >>>>>>>
> > >>>>>>> It is pretty cool indeed!
> > >>>>>>>
> > >>>>>>> I wonder how it needs to be structured to be:
> > >>>>>>> - easy to access/use from other components wherever it is 
> > >>>>>>> needed
> > >>>>>>> - doesn't interfere with the rest of the stack
> > >>>>>>>
> > >>>>>>> I guess one possible way would be to implement the generator 
> > >>>>>>> as a
> > >> set of maven
> > >>>>>>> artifacts, that could be installed/consumed transparently by 
> > >>>>>>> just
> > >> declaring a
> > >>>>>>> dependency e.g as proposed via top-level component.
> > >>>>>>>
> > >>>>>>> Another way is to have a new package like we do for 
> > >>>>>>> bigtop-utils
> > >> and such.
> > >>>>>>>
> > >>>>>>> Perhaps this discussion should be moved to JIRA or shall we
> > >> continue on the
> > >>>>>>> dev@ ??
> > >>>>>>>
> > >>>>>>> Cos
> > >>>>>>>
> > >>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> > >>>>>>>> Hi BigTop,
> > >>>>>>>>
> > >>>>>>>> I had a discussion with Jay yesterday, we'd like to propose 
> > >>>>>>>> a new
> > >> component
> > >>>>>>>> for BigTop: BigTop Data Generators.
> > >>>>>>>>
> > >>>>>>>> BigTop Data Generators would consist of a common set of 
> > >>>>>>>> libraries
> > >> for
> > >>>>>>>> building data generators and three example data generators:
> > >>>>>>>>
> > >>>>>>>> * BigPetStore transaction generator (moved from 
> > >>>>>>>> BigPetStore)
> > >>>>>>>> * BigTop Bazaar -- attendee movement and interactions with 
> > >>>>>>>> booths
> > >> on a
> > >>>>>>>> showroom floor, at a conference, or at a mall
> > >>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> > >> (temperature, wind
> > >>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a 
> > >>>>>>>> model
> > >> trained on
> > >>>>>>>> NOAA historical weather data)
> > >>>>>>>>
> > >>>>>>>> We believe that creating a common set of libraries will 
> > >>>>>>>> have
> > >> several
> > >>>>>>>> benefits including:
> > >>>>>>>>
> > >>>>>>>>  * Easier for others to build their own data generators
> > >>>>>>>>  * Make data generators smaller and easier to maintain
> > >>>>>>>>  * Share improvements across the data generators
> > >>>>>>>>
> > >>>>>>>> More details on the libraries are below.
> > >>>>>>>>
> > >>>>>>>> BigPetStore will be continue to focus on building  and 
> > >>>>>>>> maintaining blueprints, powered by the BigTop Data Generators.
> > >>>>>>>>
> > >>>>>>>> Our vision is that we get all of Apache coming to BigTop 
> > >>>>>>>> for tools
> > >> for
> > >>>>>>>> building better, more comprehensive blueprints.  We want to
> > >> support these
> > >>>>>>>> efforts through data generators and the initial set of 
> > >>>>>>>> blueprint
> > >> we've been
> > >>>>>>>> building.
> > >>>>>>>>
> > >>>>>>>> If the community is generally in support of this, I can 
> > >>>>>>>> create a
> > >> top-level
> > >>>>>>>> "bigtop-data-generators" directory and put the data 
> > >>>>>>>> generators and libraries in there.
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> RJ
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> -------
> > >>>>>>>> Library details:
> > >>>>>>>>
> > >>>>>>>> So far, I've extracted the following common libraries:
> > >>>>>>>>
> > >>>>>>>>  * Samplers -- provides classes for PDFs and various 
> > >>>>>>>> samplers
> > >>>>>>>>  * Name generator -- data set and samplers for generating 
> > >>>>>>>> names
> > >>>>>>>>  * Location data set -- data set and classes for US zip 
> > >>>>>>>> codes,
> > >> their
> > >>>>>>>> GPS coordinates, median house hold incomes, and population 
> > >>>>>>>> sizes
> > >>>>>>>>  * Product generator -- library for enumerating products 
> > >>>>>>>> from a specification file.  Comes with default 
> > >>>>>>>> specifications for
> > >> BigPetStore
> > >>>>>>>>
> > >>>>>>>> I also expect that I'll add libraries for:
> > >>>>>>>>
> > >>>>>>>>   * Particle simulation -- customer movement in a room
> > >>>>>>>>   * Latent factor model generation -- generate latent 
> > >>>>>>>> factors and customer weights to create something like MovieLens 
> > >>>>>>>> data.
> > >>>>>>>> Used in
> > >> Bazaar
> > >>>>>>>> for booth preferences and potentially in BigPetStore for 
> > >>>>>>>> customer
> > >> item
> > >>>>>>>> preferences
> > >>>>>>>>
> > >>>>>>>> Most of these libraries came out of the BigPetStore data 
> > >>>>>>>> generator
> > >> but the
> > >>>>>>>> other generators have been refactored to be based off the 
> > >>>>>>>> standard
> > >> set of
> > >>>>>>>> libraries.
> > >>
> > >>
> > >>
> >
>
>


--
jay vyas

Reply via email to