On Sat, Sep 22, 2007 at 10:22:01PM +0800, yhxiao wrote: >Dear developers: > Hi,my name is floodhong,a student in china. > Recently I want to make a cluster on hadoop, but now I have a question, > How to let one data file only only be excuted by one task?
To be clear, do you have a bunch of files (no. of files >= 1) and want each of your maps to work on one complete file only? If so, yes, that is possible. Essentially a job's input is represented by the InputFormat(interface)/FileInputFormat(base class). For your purpose would need a 'non-splittable' FileInputFormat i.e. an input-format which essentially tells the map-reduce framework that it cannot be split-up and processed. To do this you need your particular input-format to return *false* for the isSplittable() call. E.g. take a look at org.apache.hadoop.mapred.SortValidator.RecordStatsChecker.NonSplitableSequenceFileInputFormat in src/test/org/apache/hadoop/mapred/SortValidator.java Arun >which is very difficult to me. Can I do that? if the answer is "yes", tell me >how to do that? > I am very long for your answer. > > > > your > sincerely > > > > > > >yhxiao >2007-09-22