Hi Lahiru, I think we have pretty much this functionality done in the similar way you > are explaining. I have added the code in to trunk and will provide some > test classes and will update the schedular to return the HadoopProvider. >
Yes. The Hadoop provider that you have committed more or less does the same thing that I was planning to do :-). I believe we can do following two important improvements on top of that. 1. Adding support for handling chains of jobs. This is different from having individual jobs orchestrated in the workflow level. 2. Support for asynchronous job execution which I believe is a must-have for long-running, data-intensive MapReduce jobs. I am +1 to enable these API to enable to use other components, but do you > think actual users would have a concern about the underneath library we > use for Mapreduce jobs ? I am not quite confident about the way people are > using these. But anyhow its nice to have a support for these. > They are not MapReduce frameworks. Sector/Sphere is a completely different execution framework for data-intensive computing. Hyracks is also another data-intensive computing framework that also supports MapReduce. The idea is to compare their performance and see which is better. Thanks, Danushka
