[
https://issues.apache.org/jira/browse/PIG-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293188#comment-14293188
]
Niels Basjes commented on PIG-4397:
-----------------------------------
On a different machine (same pig version) I used this script
{code}
Lines =
LOAD 'test.log' USING PigStorage(' ')
AS ( A:chararray, B:chararray, C:chararray);
DUMP Lines;
STORE Lines INTO 'Lines'
USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t', 'NO_MULTILINE',
'WINDOWS', 'WRITE_OUTPUT_HEADER');
{code}
This input
{code}
1 2 3
4 5
6 7
8
9
10
{code}
DUMP:
{code}
(1,2,3)
(,4,5)
(6,,7)
(8,,)
(,9,)
(,,10)
{code}
CSV:
{code}
A B C
1 2 3
4 5
6 7
8 8
9 9
10
{code}
Here the error is visible in the '8' and '9'.
> CSVExcelStorage incorrect output if last field value is null
> ------------------------------------------------------------
>
> Key: PIG-4397
> URL: https://issues.apache.org/jira/browse/PIG-4397
> Project: Pig
> Issue Type: Bug
> Environment: Running the Pig version bundled with HDP 2.1.2:
> 0.12.1.2.1.2.0-402
> Reporter: Niels Basjes
> Priority: Critical
>
> I have the following input:
> {code}
> one two
> three
> four
> {code}
> I run this code
> {code}
> Lines =
> LOAD 'test.log' USING PigStorage(' ')
> AS ( First:chararray , Second:chararray );
> DUMP Lines;
> STORE Lines INTO 'Lines'
> USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t', 'NO_MULTILINE',
> 'WINDOWS', 'WRITE_OUTPUT_HEADER');
> {code}
> The output from the DUMP is correct:
> {code}
> (one,two)
> (three,)
> (,four)
> {code}
> The output from the CSVExcelStorage is incorrect:
> {code}
> First Second
> one two
> three three
> four
> {code}
> The problem is that if the last field is a null then the previous value is
> repeated incorrectly (in this case 'three').
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)