Hi all, I have some simple questions that I would like answered to get a better understanding of what Hadoop/Mapreduce is.
I noticed in the code of the WordCount example: conf.setInputPath(new Path((String) other_args.get(0))); conf.setOutputPath(new Path((String) other_args.get(1))); Does working with hadoop always involve having a set of files in one directory as input and resulting in a set of files in one directory as output? Are the names of the files in input and output directory insignificant? How do you handle the end result of a set of Mapreduce tasks? If the result is a set of files do you have to use another Mapreduce task that doesn't write to file (to the DFS for example) but to a simple String to display something on a webpage for example? Or do you have to read the resulting files directly. If my gigantic set of input files keeps growing, do I have re-mapreduce to whole input set to get a single result set? Or can I just Mapreduce the incremental part and use another Mapreduce task to create a single result of x number of results sets? Thanks for any help! -- regards, Jeroen
