On Sat, Feb 15, 2014 at 10:24PM, Jay Vyas wrote:
> Glad to hear there is some interest.  Here is a JIRA to take it further.
> 
> https://issues.apache.org/jira/browse/BIGTOP-1212
> 
> @Cos, we need something flexible enough to do differnt types of data
> sets,and possibly embed patterns in the data, do you know of any place to
> start ? is GridMix, for example, or SLive, pluggable in that way?

I don't think either of these would work really. Let's investigate.

> If not we might have to hack our own together.
> 
> Maybe respond in BIGTOP-1212 above.
> 
> 
> On Sat, Feb 15, 2014 at 9:47 PM, Konstantin Boudnik <[email protected]> wrote:
> 
> > Neat idea! I think the answer depends on what kinda data we want to
> > generate.
> >  - I had a good run with gridmix for variery of longevity loads (too bad
> >    Cloudera never released the code to open source).
> >  - for HDFS testing we can use SLive and DFSIO (BIGTOP-1208 and
> > BIGTOP-1209)
> >    are pretty much ready, it seems
> >
> > At any rate, I'd rather prefer to incorporate something readily available
> > that
> > has good community behind it, so we won't end up supporting an big chunk of
> > specialized software.
> >
> > So, what do you have in mind? Any details?
> >   Cos
> >
> > On Sat, Feb 15, 2014 at 09:19AM, Jay Vyas wrote:
> > > Hi bigtop.  Are we interested in maintaining our own infra for generating
> > > fake data , rather than relying on and downloading external data sources
> > for
> > > smokes?  Fake data is great for testing I think...
> > >
> > > In bigpetstore I'm generating fake data , written a lot of code to do
> > this
> > > in the custom input formats.... but I just found :
> > >
> > > http://codearte.github.io/jfairy/
> > >
> > > Which is a groovy tool for doing the same....
> > >
> > >   I wonder wether generating fake data for testing big data should be a
> > >   first-class part of bigtop ?  Would others use a utility or just me ?
> > >
> > > It might be another useful artifact for the community especially for
> > > bigpetstore but also for testing a variety of other machine learning
> > related
> > > projects....
> > >
> > > I think it's bad to rely on external websites for our tests, maybe in
> > time
> > > we could move over to our in internally curated/generated data sets ,
> > and a
> > > data generation tool like the above moves us in that direction.
> >
> >
> 
> 
> -- 
> Jay Vyas
> http://jayunit100.blogspot.com

Attachment: signature.asc
Description: Digital signature

Reply via email to