Doug, I agree that this isnt a high priority change, I'm just trying to start discussion towards what is needed to make multijob work well.
I really like Yoram's suggestion of a single limit for map and reduce tasks. Not charging the copy(shuffle) phase to that limit could be part of making that work. Again, no urgency. We are already running two parallel clusters on the same boxes, we call them Blue (normal) and Yellow (nice'd), named after the colors on the Ganglia CPU display. We run long jobs on the nice'd cluster, and short jobs at normal priority. It works really well. Kevin should be submitting the two patches we needed to make it work. Paul On 7/25/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Paul Sutter wrote: > First, It matters in the case of concurrent jobs. If you submit a 20 > minute job while a 20 hour job is running, it would be nice if the > reducers for the 20 minute job could get a chance to run before the 20 > hour job's mappers have all finished. So even without a throughput > improvement, you have an important capability (although it may require > another minor tweak or two to make possible). I fear that more than a minor tweak or two are required to make concurrent jobs work well. For example, you would also want to make sure that the long-running job does not consume all of the reduce slots, or the short job would again get stuck behind it. Pausing long-running tasks might be required. The best way to do this at present is to run two job trackers, and two tasktrackers per node, then submit long-runnning jobs to one "cluster" and short-running jobs to the other. > Secondarily, we often have stragglers, where one mapper runs slower > than the others. When this happens, we end up with a largely idle > cluster for as long as an hour. In cases like these, good support for > concurrent jobs _would_ improve throughput. Can you perhaps increase the number of map tasks, so that even a slow task takes only a very small portion of the total execution time? Good support for concurrent jobs would be great to have, and I'd love to see a patch that addresses this issue comprehensively. I am not convinced that it is worth making minor tweaks that may-or-may-not really help us to get there. Doug
