[
https://issues.apache.org/jira/browse/PIG-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518268#comment-14518268
]
Daniel Dai commented on PIG-4513:
---------------------------------
The script still does not work for me, I made some change below:
{code}
a = load 'file1' using PigStorage('|');
cleanedData = foreach a generate (chararray) (TRIM($0) == '\\\\N' ? NULL :
TRIM($0)) as a, (int) (TRIM($1) == '\\\\N' ? NULL : TRIM($1)) as b, (chararray)
(TRIM($2) == '\\\\N' ? NULL : TRIM($2)) as c;
store cleanedData into 'ooo' using
org.apache.pig.piggybank.storage.avro.AvroStorage();
{code}
Actually I still get the right result when I read back:
{code}
a = load 'ooo/part-m-00000.avro' using
org.apache.pig.piggybank.storage.avro.AvroStorage();
dump a;
{code}
(name,,gender)
(aba,25,m)
(,45,f)
(cdb,54,f)
(,98,)
(,100,m)
(iwoe,23,f)
I am using Pig 0.12.0.
> Lines dropped in delimited text when they begin with null/no-data
> -----------------------------------------------------------------
>
> Key: PIG-4513
> URL: https://issues.apache.org/jira/browse/PIG-4513
> Project: Pig
> Issue Type: Bug
> Components: parser, piggybank
> Affects Versions: 0.12.0
> Environment: CDH5.2.x, CDH5.3.x
> Reporter: Madhan Sundararajan Devaki
> Priority: Blocker
> Fix For: 0.15.0
>
>
> When Pig (0.12) is used to process delimited text files (| delimited), lines
> that do not contain data in the first column are dropped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)