Hi all,

I have some simple questions that I would like answered to get a
better understanding of what Hadoop/Mapreduce is.

I noticed in the code of the WordCount example:

  conf.setInputPath(new Path((String) other_args.get(0)));
  conf.setOutputPath(new Path((String) other_args.get(1)));

Does working with hadoop always involve having a set of files in one
directory as input and resulting in a set of files in one directory as
output? Are the names of the files in input and output directory
insignificant?

How do you handle the end result of a set of Mapreduce tasks? If the
result is a set of files do you have to use another Mapreduce task
that doesn't write to file (to the DFS for example) but to a simple
String to display something on a webpage for example? Or do you have
to read the resulting files directly.

If my gigantic set of input files keeps growing, do I have
re-mapreduce to whole input set to get a single result set? Or can I
just Mapreduce the incremental part and use another Mapreduce task to
create a single result of x number of results sets?

Thanks for any help!

--

regards,

Jeroen

Reply via email to