Rahul Tenany wrote:
Hi Amareshwari,
It is in the ToolRunner.run() method that i am setting the FileInputFormat as NLineInputFormat and in the same function i am setting the mapred.line.input.format.linespermap property. Will that not work? How can i overload LineRecordReader, so that it returns the value as N Lines?

Thanks
Rahul

On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Hi Rahul,

    How did you set the configuration
    "mapred.line.input.format.linespermap" and your input format? You
    have to set them in hadoop-site.xml or pass them through -D option
    to the job.
    NLineInputFormat will split N lines of input as one split. So,
    each map gets N lines.
    But the RecordReader is still LineRecordReader, which reads one
    line at time, thereby Key is the offset in the file and Value is
    the line.
    If you want N lines as Key, you may to override LineRecordReader.

    Thanks
    Amareshwari


    Rahul Tenany wrote:

        Hi,   I am writing a Binary Search Tree on Hadoop and for the
        same i require
        to use NLineInputFormat. I'll read n lines at a time, convert
        the numbers in
        each line from string to int and then insert them into the
        binary tree. Once
        the binary tree is made i'll search for elements in it. But
        even if i set
        that input format as NLineInputFormat and set the
        mapred.line.input.format.linespermap
        to 10, i am able to read only 1 line at the time. Any idea
        where am i going
        wrong? How can i find whether NLineInputFormat is working or not?

        I want my program to work for any object that is comparable
        and not just
        integers, so in there any way i can read NObjects at a time?

        I am completely stuck. Any help will be appreciated.

        Thanks
        Rahul



One more thing, I don't think you need to use NLineInputFormat for your requirement. NLineInputFormat splits N lines as one split, thus each map processes N lines. In your application, you don't want each map to process just N lines, but you want value as N lines, right? So, you should right a new input format extending FileInputFormat and getRecordReader should return your new RecordReader implementation. Does this make sense?

Thanks
Amareshwari

Reply via email to