[ 
https://issues.apache.org/jira/browse/PIG-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659338#comment-14659338
 ] 

Ken Wu commented on PIG-4623:
-----------------------------

Hi Rohini Palaniswamy,

Thanks for your comments.  Let me comment on your first two items (since the 
third item is invalid claimed by you)

- On the second item, the input you gave: 
fieldD1apples"oranges\nFieldA2,fieldB2 (I took a short form of it for the sake 
of conciseness), it is not a valid csv format.  Please check it here: 
http://csvlint.io/  .  Therefore, anything can go wrong if you give an invalid 
csv format as input. 

- On the first item, i preserved the semantic of the original code - If nothing 
was to be read, it would return null.  At any time, either two cases can happen:
  a) Inside a double quote:  If we encounter a EOF character, by my above 
point, it is not a valid csv file.  Therefore, we do not need to consider this 
case.
  b) Outside a double quote (or double quote is not present at all) : Then the 
semantic would just be exactly the same as the original one. 

Therefore, i still see my patch is correct now.  Please let me know if anyone 
still see any error.

> Fixed the 'new line' character inside double-quote causing the csv parsing 
> failure
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-4623
>                 URL: https://issues.apache.org/jira/browse/PIG-4623
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.15.0
>            Reporter: Ken Wu
>            Assignee: Ken Wu
>             Fix For: 0.16.0
>
>         Attachments: CSVLoader.java, PIG-4623-1.patch, TestCSVStorage.java
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A new line character should be allowed inside a double quote as a valid csv 
> document. For example, the following csv document should be treated as a 
> SINGLE valid csv data
> Iphone,"{ ItemName : Cheez-It
> 21 Ounce}",
> However, the current implementation of the getNext() inside 
> org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this 
> case and it sees two lines of data while in fact it should be treated as 
> single line of data.
> This pull request fixes the above issue.
> (Note: here is a linke to validate whether a csv document: http://csvlint.io/)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to