I have never done this, but I think it should be possible.  You will likely not 
get much data locality doing this, but you should be able create your own input 
format and have it write out entries in the split file that indicate the ranges 
you wanted.  I may be wrong but I thought that the InputFormat is both the 
producer and only consumer of the split data.

--Bobby Evans

On 11/30/11 12:28 AM, "Rob Podolski" <robpodol...@yahoo.co.uk> wrote:

Hi

I have a computation to do for a large input - a single large sequence file.  
Ideally I would like to set a specific number of mappers and designate each to 
process over a specific range of records in the input sequence file.  For 
various reasons, the record ranges that I would want to pass to each mapper 
would be over-lapping (e.g. mapper 1 record ranges 1 - 1000, mapper 2 record 
ranges 700 - 2000 etc).

Is it possible to do this? If so how would I go about it?  InputFormat does not 
seem to cater for this.  Perhaps Hadoop might not be the right 'parallel' 
framework for me to do this in.

thnks.




Reply via email to