Hi,

I have a file which is coming from any of the source system to *HDFS* with
more than one *newline character* like *\n* and *\r* which is creating
extra lines while a MapReduce/Pig job gets invoked.
I'm ok with having *\n* as newline and just want to avoid *\r*.
I'm setting newline character while running my pig job using below
property:



*-D textinputformat.record.delimiter*
I tried many of values to set newline character but it is not making any
difference and reading whole file as a single row.
Below are some values which i have already tried to set \n as newline
character -

-D textinputformat.record.delimiter=\\n
-D textinputformat.record.delimiter=\\u000a
-D textinputformat.record.delimiter=\u000a
-D textinputformat.record.delimiter=0x0a
-D textinputformat.record.delimiter=0x0A
-D textinputformat.record.delimiter=00001010
-D textinputformat.record.delimiter=\&#xa\;

Is there any possible value which I'm missing?

I was also looking into creating a custom loader for this and planning
to extend PigStorage class

but I'm not sure to do that i have to write my own RecordReader as well?


*Thanks,*

Reply via email to