On 02/15/2014 06:19 AM, Jay Vyas wrote:
Hi bigtop. Are we interested in maintaining our own infra for generating fake
data , rather than relying on and downloading external data sources for smokes?
Fake data is great for testing I think...
In bigpetstore I'm generating fake data , written a lot of code to do this in
the custom input formats.... but I just found :
http://codearte.github.io/jfairy/
Which is a groovy tool for doing the same....
I wonder wether generating fake data for testing big data should be a
first-class part of bigtop ? Would others use a utility or just me ?
It might be another useful artifact for the community especially for
bigpetstore but also for testing a variety of other machine learning related
projects....
I think it's bad to rely on external websites for our tests, maybe in time we
could move over to our in internally curated/generated data sets , and a data
generation tool like the above moves us in that direction.
Hi Jay,
Generating fake data is an interesting idea and I don't see any reason
to not use that when appropriate.
Regarding having our own framework vs re-using a library, it depends.
Writing our own framework is an option if there is no existing APLv2
(-compatible?) library we can use or extend for our needs.
But writing code to facilitate such task would be welcome in any case.
Ex: map/reduce jobs that use jfairy to generate TBs of data.
Thanks,
Bruno