Niels Basjes created PIG-4689:
---------------------------------

             Summary: CSV Writes incorrect header if two CSV files are created 
in one script
                 Key: PIG-4689
                 URL: https://issues.apache.org/jira/browse/PIG-4689
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.14.0
            Reporter: Niels Basjes


>From a single Pig script I write two completely different and unrelated CSV 
>files; both with the flag 'WRITE_OUTPUT_HEADER'.

The bug is that both files get the SAME header at the top of the output file 
even though the data is different.

*Reproduction:*
{code:title=foo.txt}
1
{code}

{code:title=bar.txt (Tab separated)}
1       a
{code}

{code:title=WriteTwoCSV.pig}
FOO =
    LOAD 'foo.txt'
    USING PigStorage('\t')
    AS (a:chararray);

BAR =
    LOAD 'bar.txt'
    USING PigStorage('\t')
    AS (b:chararray, c:chararray);

STORE FOO into 'Foo'
USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 
'UNIX', 'WRITE_OUTPUT_HEADER');

STORE BAR into 'Bar'
USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 
'UNIX', 'WRITE_OUTPUT_HEADER');
{code}

*Command:*
{quote}pig -x local WriteTwoCSV.pig{quote}

*Result:*
{quote}cat Bar/part-*{quote}
{code}
b       c
1       a
{code}
{quote}cat Foo/part-*{quote}
{code}
b       c
1
{code}

*The error is that the {{Foo}} output has a the two column header from the 
{{Bar}} output.*
*One of the effects is that parsing the {{Foo}} data will probably fail due to 
the varying number of columns*






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to