The problem is the parameter will pass to TextInputFormat without
interpreting escape sequences, makes it hard to pass \n character.

One alternative approach is to write a simple LoadFunc and passing the
parameter using Java string, which will interpreting escape sequences, for
example:

public class PigStorageNewLine extends PigStorage {
    @Override
  public void setLocation(String location, Job job) throws IOException {
    job.getConfiguration().set("textinputformat.record.delimiter", "\n");
    super.setLocation(location, job);
  }
}


Thanks,
Daniel

On 11/11/15, 11:59 PM, "Bhagwan S. Soni" <bhgwnsson...@gmail.com> wrote:

>Hi,
>
>I have a file which is coming from any of the source system to *HDFS* with
>more than one *newline character* like *\n* and *\r* which is creating
>extra lines while a MapReduce/Pig job gets invoked.
>I'm ok with having *\n* as newline and just want to avoid *\r*.
>I'm setting newline character while running my pig job using below
>property:
>
>
>
>*-D textinputformat.record.delimiter*
>I tried many of values to set newline character but it is not making any
>difference and reading whole file as a single row.
>Below are some values which i have already tried to set \n as newline
>character -
>
>-D textinputformat.record.delimiter=\\n
>-D textinputformat.record.delimiter=\\u000a
>-D textinputformat.record.delimiter=\u000a
>-D textinputformat.record.delimiter=0x0a
>-D textinputformat.record.delimiter=0x0A
>-D textinputformat.record.delimiter=00001010
>-D textinputformat.record.delimiter=\&#xa\;
>
>Is there any possible value which I'm missing?
>
>I was also looking into creating a custom loader for this and planning
>to extend PigStorage class
>
>but I'm not sure to do that i have to write my own RecordReader as well?
>
>
>*Thanks,*

Reply via email to