[
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155384#comment-14155384
]
Alfonso Nishikawa commented on PIG-4213:
----------------------------------------
After researching, I share some conclusions before finishing the patch:
* {{CSVExcelStorage}} uses {{PigTextInputFormat}}, which extends
{{TextInputFormat}}, which instantiates {{LineRecordReader}}.
* {{LineRecordReader}} splits the input by {{\r}} considering CR a linefeed.
* Reading data with {{CSVExcelStorage}} will treat {{\r}} the same as {{\n}}
and {{\r\n}}.
* Reading data with {{CSVExcelStorage}} will substitute any alone {{\r}} in the
input for a {{\n}}, but anyway, this reading behavior is the same as used to be
and can't be fixed since it belongs to {{LineRecordReader}}.
this implies:
* What can be fixed is the output regarding to this ticket: quote when there is
a {{\r}} present.
* What can not be fixed is the fact that the operation {{load(store(x))}} will
not be idempotent if there is any CR present. But this behavior, again, is the
same as it was before this patch.
I will check this in a while :)
> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -----------------------------------------------------------------
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
> Issue Type: Improvement
> Components: piggybank
> Affects Versions: 0.12.0
> Reporter: Alfonso Nishikawa
> Assignee: Alfonso Nishikawa
> Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in
> Mac OS 9 (or bellow). When exporting the text, it is not being quoted so
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's
> bugged).
> I suggest including the CR case in the same way as commented in
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)