Marc,

The answer depends on the Hadoop version you are running. The
following requires
https://issues.apache.org/jira/browse/MAPREDUCE-2254 which is present
currently in 0.23 (and eventually 2.x) and also (last I checked) in
CDH3 if you use that:

Simply set "textinputformat.record.delimiter" in your Job's
configuration to the exact character string you need, and that will
get used as a record/line delimiter in TextInputFormat. The string can
also be multi-character, and the records would be read based to that
provided sequence.

Its unavailable presently in 1.x, but it appears harmless to add this
in and if you can file a JIRA with a backport I can review and commit
it in for a future 1.x update.

On Tue, Apr 10, 2012 at 12:31 AM, Marc Sturm <mas9...@nyp.org> wrote:
> Hi,
>
> I am new to Mapreduce and I have a short question: is it possible for a
> MapReduce job to split the lines of a file with \n and ignore \r? Basically,
> in the use case I am looking into, the \r has to be included when reading a
> line.
>
> I am just “playing” with mapreduce with a standalone hadoop, not using hdfs,
> and I am looking into writing my own LineReader but I am afraid it is much
> more complicated than this. I can also update each line and replace the \r
> with a \t, but I rather leave the file and data as is.
>
> Any insight and/or link to the correct documentation will be appreciated.
>
> Thanks,
>
> Marc
>
>
>
>
> ________________________________
> This electronic message is intended to be for the use only of the named
> recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or use of the contents of this message is
> strictly prohibited. If you have received this message in error or are not
> the named recipient, please notify us immediately by contacting the sender
> at the electronic mail address noted above, and delete and destroy all
> copies of this message. Thank you.
>
> --------------------
>
> This electronic message is intended to be for the use only of the named
> recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or use of the contents of this message is
> strictly prohibited.  If you have received this message in error or are not
> the named recipient, please notify us immediately by contacting the sender
> at the electronic mail address noted above, and delete and destroy all
> copies of this message.  Thank you.
>
> --------------------
>
> This electronic message is intended to be for the use only of the named
> recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or use of the contents of this message is
> strictly prohibited.  If you have received this message in error or are not
> the named recipient, please notify us immediately by contacting the sender
> at the electronic mail address noted above, and delete and destroy all
> copies of this message.  Thank you.
>
>



-- 
Harsh J

Reply via email to