On Sat, Sep 22, 2007 at 10:22:01PM +0800, yhxiao wrote:
>Dear developers:
>       Hi,my name is floodhong,a student in china.
>       Recently I want to make a cluster on hadoop, but now I have a question, 
> How to let one data file only only be excuted by one task?

To be clear, do you have a bunch of files (no. of files >= 1) and want each of 
your maps to work on one complete file only?
If so, yes, that is possible.

Essentially a job's input is represented by the 
InputFormat(interface)/FileInputFormat(base class).
For your purpose would need a 'non-splittable' FileInputFormat i.e. an 
input-format which essentially tells the map-reduce framework that it cannot be 
split-up and processed. To do this you need your particular input-format to 
return *false* for the isSplittable() call.

E.g. take a look at 
org.apache.hadoop.mapred.SortValidator.RecordStatsChecker.NonSplitableSequenceFileInputFormat
 in src/test/org/apache/hadoop/mapred/SortValidator.java

Arun

>which is very difficult to me. Can I do that? if the answer is "yes", tell me 
>how to do that? 
>       I am very long for your answer.
>
>
>                                                                               
>                                                                      your 
> sincerely
> 
>
>
>
>
>
>yhxiao
>2007-09-22

Reply via email to