Marc, The answer depends on the Hadoop version you are running. The following requires https://issues.apache.org/jira/browse/MAPREDUCE-2254 which is present currently in 0.23 (and eventually 2.x) and also (last I checked) in CDH3 if you use that:
Simply set "textinputformat.record.delimiter" in your Job's configuration to the exact character string you need, and that will get used as a record/line delimiter in TextInputFormat. The string can also be multi-character, and the records would be read based to that provided sequence. Its unavailable presently in 1.x, but it appears harmless to add this in and if you can file a JIRA with a backport I can review and commit it in for a future 1.x update. On Tue, Apr 10, 2012 at 12:31 AM, Marc Sturm <mas9...@nyp.org> wrote: > Hi, > > I am new to Mapreduce and I have a short question: is it possible for a > MapReduce job to split the lines of a file with \n and ignore \r? Basically, > in the use case I am looking into, the \r has to be included when reading a > line. > > I am just “playing” with mapreduce with a standalone hadoop, not using hdfs, > and I am looking into writing my own LineReader but I am afraid it is much > more complicated than this. I can also update each line and replace the \r > with a \t, but I rather leave the file and data as is. > > Any insight and/or link to the correct documentation will be appreciated. > > Thanks, > > Marc > > > > > ________________________________ > This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or use of the contents of this message is > strictly prohibited. If you have received this message in error or are not > the named recipient, please notify us immediately by contacting the sender > at the electronic mail address noted above, and delete and destroy all > copies of this message. Thank you. > > -------------------- > > This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or use of the contents of this message is > strictly prohibited. If you have received this message in error or are not > the named recipient, please notify us immediately by contacting the sender > at the electronic mail address noted above, and delete and destroy all > copies of this message. Thank you. > > -------------------- > > This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or use of the contents of this message is > strictly prohibited. If you have received this message in error or are not > the named recipient, please notify us immediately by contacting the sender > at the electronic mail address noted above, and delete and destroy all > copies of this message. Thank you. > > -- Harsh J