[ 
https://issues.apache.org/jira/browse/PIG-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659206#comment-14659206
 ] 

Rohini Palaniswamy commented on PIG-4623:
-----------------------------------------

Actually my 3rd comment was wrong. That cannot happen as TextInputFormat will 
only read till {{FieldA1,"fieldB1 apples}}. After that it will give false for 
in.nextKeyValue(). So {{oranges",fieldC1,fieldD1}} cannot be read and it will 
hit the 1st case where the whole line will be missed.  Previously also it was 
no better with only difference being it will output two wrong records - first 
record will have partial columns and nulls for rest. Second record will have 
wrong data in each of its columns.  

Unless a custom input format is used which handles quotes along with field and 
line delimiters while parsing records, I don't think this problem can have a 
clean solution and handle all cases.



> Fixed the 'new line' character inside double-quote causing the csv parsing 
> failure
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-4623
>                 URL: https://issues.apache.org/jira/browse/PIG-4623
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.15.0
>            Reporter: Ken Wu
>            Assignee: Ken Wu
>             Fix For: 0.16.0
>
>         Attachments: CSVLoader.java, PIG-4623-1.patch, TestCSVStorage.java
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A new line character should be allowed inside a double quote as a valid csv 
> document. For example, the following csv document should be treated as a 
> SINGLE valid csv data
> Iphone,"{ ItemName : Cheez-It
> 21 Ounce}",
> However, the current implementation of the getNext() inside 
> org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this 
> case and it sees two lines of data while in fact it should be treated as 
> single line of data.
> This pull request fixes the above issue.
> (Note: here is a linke to validate whether a csv document: http://csvlint.io/)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to