I please correct me if I'm reading the code incorrectly, but it seems like submitJob puts the submitted job on the jobInitQueue which is immediately dequeued by the JobInitThread and then initTasks() will get the file splits and create Tasks. Thus, it doesn't seem like there is any difference in memory foot print.
ben Doug Cutting wrote: > > Right, so JobSubmissionProtocol.submitJob(String jobFile) could be > altered to be submitJob(StringJobFile, Split[]). The RPC system can > handle reasonably large values like this, so I don't think that would > be a problem. But the memory impact on the JobTracker could become > significant, since the splits for queued jobs would now be around. > This could be mitigated by writing the splits to a temporary file. > > The semantics would be subtly different: if you queue a job now, the > file listing is done just before the job is executed, not when its > submitted. But programs shouldn't rely on that, so I don't think this > is a big worry. > > Overall, I don't see any major problems with this. It won't simplify > things much. We can remove the code which computes splits in a > separate thread, but we'd have to add code to store splits to > temporary files, so codesize is a wash. And it would remove a > potential reliability problem.