Lance Amundsen wrote:
I am starting to wonder if it might be indeed impossible to get map jobs
running w/o writing to the file system.... as in, not w/o some major
changes to the job and task tracker code.

I was thinking about creating an InputFormat that does no file I/O, instead
is queue based.  As mappers start up, their getRecordReader calls get
re-directed to a remote queue to pull one or more records off of.  But I am
starting to wonder if the file system dependencies in the code are such
that I could never completely avoid using files.  Specifically, even if I
completely re-write an InputFormat, the framework is still going to try to
do Filesystem stuff on everything I return  (the extensive internal use of
splits is baffling me some).

Nothing internally should depend on an InputSplit representing a file. You do need to be able to generate the complete set of splits when the job is launched. So if you wanted maps to poll a queue for each task then you'd need to know how long the queue is when the job is launched so that you could generate the right number of polling splits.

Doug

Reply via email to