Hello Harsh, Meantime I figured out what was the problem (it was my bad, intermixing of the API's), however I read somewhere that using it (from the old API) in 0.20.2 can cause problems. So I took NLineInputFormat.java from the 2.0 branch and simply inserted it in my project, it all went fine.
However, as I notice, although as many tasks are generated as the number of line in my input file, the whole thing (the whole job) still gets executed on a single node (on a single slave) - at least there is only one job showing up on my jobtracker, running on one of my slaves. What I want is distribution in a way that for the very same (single) input file, all my running slaves get involved and process (separately) the lines of this input file. I don't even have a reduce phase at the moment, I only want to do the processing on the input, through the mapper. Is the scenario I described achievable? How should I proceed? Thank you, Lehel. --- On Sun, 5/20/12, Harsh J <ha...@cloudera.com> wrote: From: Harsh J <ha...@cloudera.com> Subject: Re: Set number of mappers by the number of input lines for a single file? To: common-user@hadoop.apache.org Date: Sunday, May 20, 2012, 1:54 PM Biro, 0.20.2 did carry NLineInputFormat but in the older/stable (marked deprecated, but was undeprecated subsequently) API package. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html which does confirm that 0.20.2 carried it. For 0.20.2, I recommend sticking to the mapred.* API package. For the new API (mapreduce.* package) version, you can also grab the source and include it with the license into your project (and follow whatever is required in doing so) from here: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java Hope this helps. On Sun, May 20, 2012 at 4:03 PM, biro lehel <lehel.b...@yahoo.com> wrote: > Hello Harsh, > > Thanks for your answer. The problem is, that I'm using version 0.20.2, and, > as I checked, NLineInputFormat is not implemented here (at least I couldn't > find it). Switching to an other version would be kind of a big deal in my > infrastructure, since I'm using VM's deployed form images already > pre-configured with 0.20.2, so it is not an option at the moment. What > should I do? > > Thanks, > Lehel. > > --- On Sun, 5/20/12, Harsh J <ha...@cloudera.com> wrote: > > From: Harsh J <ha...@cloudera.com> > Subject: Re: Set number of mappers by the number of input lines for a single > file? > To: common-user@hadoop.apache.org > Date: Sunday, May 20, 2012, 12:52 PM > > Lehel, > > You may use the NLineInputFormat with N=1: > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html > > On Sun, May 20, 2012 at 2:48 PM, biro lehel <lehel.b...@yahoo.com> wrote: >> Dear all, >> >> I have one single input file, which contains, on every line, some >> hydrological calibration models (data). Each line of the file should be >> processed and then the output from every line written to another single >> output file. >> >> I understood that hadoop spawns mapper tasks with the same number as how >> many input files there are (meaning, in my case, a single mapper would be >> generated). However, I want that a mapper to be dealing with only a single >> line from my input file (nr. of mapper tasks = number of lines in my file). >> >> What is the best way to obtain such behavior? How should I specify this to >> Hadoop? >> >> Any suggestions are more than welcome. >> >> Thank you, >> Lehel. > > > > -- > Harsh J -- Harsh J