So I managed to get my fast InputFormat working.... it does still use the
FS, but in such a way that it improves mapper startup by over 2X. And last
night I got a prototype working that allows the map task to run under the
JVM of the TaskTracker, rather than spawing a new JVM.
The initial performance look really, really good. I just ran a 1000 map
single input record job, (mappers doing no work however), in a one master,
one slave setup... on my laptop.... It completed in a couple thousand
seconds, or a couple seconds per map. Earlier I did a smaller 100 map job
with a stable quieced system and it came in at about 130 seconds.
So this prototype can start and end map jobs in 1-2 seconds, and should
scale flatly with respect to nodes in the setup.
"Owen O'Malley"
<[EMAIL PROTECTED]
m> To
[email protected]
10/24/2007 01:05 cc
PM
Subject
Re: InputFiles, Splits, Maps, Tasks
Please respond to Questions 1.3 Base
[EMAIL PROTECTED]
e.apache.org
On Oct 24, 2007, at 12:42 PM, Doug Cutting wrote:
> Lance Amundsen wrote:
>> OK, that is encouraging. I'll take another pass at it. I succeeded
>> yesterday with an in-memory only InputFormat, but only after I
>> commented
>> out some of the split referencing code, like the following in
>> MapTask.java
>> if (instantiatedSplit instanceof FileSplit) {
>> FileSplit fileSplit = (FileSplit) instantiatedSplit;
>> job.set("map.input.file", fileSplit.getPath().toString());
>> job.setLong("map.input.start", fileSplit.getStart());
>> job.setLong("map.input.length", fileSplit.getLength());
>> }
>
> Yes, that code should not exist, but it shouldn't affect you
> either. You should be subclassing InputSplit, not FileSplit, so
> this code shouldn't operate on your splits.
That code doesn't do anything if they are non file-splits, so it
absolutely shouldn't break anything. Applications depend on those
attributes to know which split they are working on and there isn't a
better fix until we move to context objects. I know that non-
filesplits work because there are units tests to make sure they don't
break anything.
-- Owen