Rahul Tenany wrote:
Hi Amareshwari,
It is in the ToolRunner.run() method that i am setting the
FileInputFormat as NLineInputFormat and in the same function i am
setting the mapred.line.input.format.linespermap property. Will that
not work? How can i overload LineRecordReader, so that it returns the
value as N Lines?
Thanks
Rahul
On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
Hi Rahul,
How did you set the configuration
"mapred.line.input.format.linespermap" and your input format? You
have to set them in hadoop-site.xml or pass them through -D option
to the job.
NLineInputFormat will split N lines of input as one split. So,
each map gets N lines.
But the RecordReader is still LineRecordReader, which reads one
line at time, thereby Key is the offset in the file and Value is
the line.
If you want N lines as Key, you may to override LineRecordReader.
Thanks
Amareshwari
Rahul Tenany wrote:
Hi, I am writing a Binary Search Tree on Hadoop and for the
same i require
to use NLineInputFormat. I'll read n lines at a time, convert
the numbers in
each line from string to int and then insert them into the
binary tree. Once
the binary tree is made i'll search for elements in it. But
even if i set
that input format as NLineInputFormat and set the
mapred.line.input.format.linespermap
to 10, i am able to read only 1 line at the time. Any idea
where am i going
wrong? How can i find whether NLineInputFormat is working or not?
I want my program to work for any object that is comparable
and not just
integers, so in there any way i can read NObjects at a time?
I am completely stuck. Any help will be appreciated.
Thanks
Rahul
One more thing, I don't think you need to use NLineInputFormat for your
requirement. NLineInputFormat splits N lines as one split, thus each map
processes N lines. In your application, you don't want each map to
process just N lines, but you want value as N lines, right? So, you
should right a new input format extending FileInputFormat and
getRecordReader should return your new RecordReader implementation. Does
this make sense?
Thanks
Amareshwari