[ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155384#comment-14155384
 ] 

Alfonso Nishikawa commented on PIG-4213:
----------------------------------------

After researching, I share some conclusions before finishing the patch:

* {{CSVExcelStorage}} uses {{PigTextInputFormat}}, which extends 
{{TextInputFormat}}, which instantiates {{LineRecordReader}}.
* {{LineRecordReader}} splits the input by {{\r}} considering CR a linefeed.
* Reading data with {{CSVExcelStorage}} will treat {{\r}} the same as {{\n}} 
and {{\r\n}}.
* Reading data with {{CSVExcelStorage}} will substitute any alone {{\r}} in the 
input for a {{\n}}, but anyway, this reading behavior is the same as used to be 
and can't be fixed since it belongs to {{LineRecordReader}}.

this implies:

* What can be fixed is the output regarding to this ticket: quote when there is 
a {{\r}} present.
* What can not be fixed is the fact that the operation {{load(store(x))}} will 
not be idempotent if there is any CR present. But this behavior, again, is the 
same as it was before this patch.

I will check this in a while :)

> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -----------------------------------------------------------------
>
>                 Key: PIG-4213
>                 URL: https://issues.apache.org/jira/browse/PIG-4213
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.12.0
>            Reporter: Alfonso Nishikawa
>            Assignee: Alfonso Nishikawa
>            Priority: Trivial
>         Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text, it is not being quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to