Hi I have a computation to do for a large input - a single large sequence file. Ideally I would like to set a specific number of mappers and designate each to process over a specific range of records in the input sequence file. For various reasons, the record ranges that I would want to pass to each mapper would be over-lapping (e.g. mapper 1 record ranges 1 - 1000, mapper 2 record ranges 700 - 2000 etc).
Is it possible to do this? If so how would I go about it? InputFormat does not seem to cater for this. Perhaps Hadoop might not be the right 'parallel' framework for me to do this in. thnks.