Create a Common Data-Generator for Testing Hadoop
-------------------------------------------------
Key: MAPREDUCE-2112
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2112
Project: Hadoop Map/Reduce
Issue Type: New Feature
Reporter: Ranjit Mathew
Priority: Minor
It is useful to have a common data-generator for testing Hadoop and related
projects. Such a tool
should be able to generate data in a specified format and should be able to use
a Hadoop cluster
for speeding up the data-generation. This tool can then be used across Hadoop
(e.g. GridMix3),
Pig, Hive, etc. reducing the need for each project to invent something like
this itself.
We can use the data-generator used in PigMix2 (PIG-200) as a starting point. It
is described
in [http://wiki.apache.org/pig/DataGeneratorHadoop]. Since it depends on the
SDSU
Java library ([http://www.eli.sdsu.edu/java-SDSU/]) released under the GNU GPL,
it has to be
modified a bit to eliminate this dependency before it can be included in Apache
Hadoop.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.