[ 
https://issues.apache.org/jira/browse/PIG-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516953#comment-14516953
 ] 

Madhan Sundararajan Devaki commented on PIG-4513:
-------------------------------------------------

Please find below, the updated script.

input = load '/data/file1' using PigStorage('|') ;
cleanedData = foreach input generate "(chararray) (TRIM($0) == '\\\\N' ? NULL : 
TRIM($0)) as $col1, (int) (TRIM($1) == '\\\\N' ? NULL : TRIM($1)) as $col2, 
(chararray) (TRIM($2) == '\\\\N' ? NULL : TRIM($2)) as $col3";
STORE cleanedData INTO '/output/out1' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();

> Lines dropped in delimited text when they begin with null/no-data
> -----------------------------------------------------------------
>
>                 Key: PIG-4513
>                 URL: https://issues.apache.org/jira/browse/PIG-4513
>             Project: Pig
>          Issue Type: Bug
>          Components: parser, piggybank
>    Affects Versions: 0.12.0
>         Environment: CDH5.2.x, CDH5.3.x
>            Reporter: Madhan Sundararajan Devaki
>            Priority: Blocker
>             Fix For: 0.15.0
>
>
> When Pig (0.12) is used to process delimited text files (| delimited), lines 
> that do not contain data in the first column are dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to