[
https://issues.apache.org/jira/browse/FLINK-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201976#comment-14201976
]
ASF GitHub Bot commented on FLINK-1223:
---------------------------------------
Github user jkirsch commented on the pull request:
https://github.com/apache/incubator-flink/pull/187#issuecomment-62133790
Thanks for having a look at the code
Indeed the Set approach is nicer to read but there is a performance penalty
incurred
I just ran a simple caliper benchmark and on my machine the set approach is
about 10x slower, so I fixed that.
https://microbenchmarks.appspot.com/runs/b3ab8918-7226-4527-b019-c622a728d144
> Allow value escaping in CSV files
> ---------------------------------
>
> Key: FLINK-1223
> URL: https://issues.apache.org/jira/browse/FLINK-1223
> Project: Flink
> Issue Type: Bug
> Components: Java API, Scala API
> Affects Versions: 0.8-incubating
> Reporter: Johannes
> Priority: Minor
>
> The CSV Parser currently does not interpret escaped values
> The example from here
> http://en.wikipedia.org/wiki/Comma-separated_values#Example
> {code}
> Year,Make,Model,Description,Price
> 1997,Ford,E350,"ac, abs, moon",3000.00
> 1999,Chevy,"Venture ""Extended Edition""","",4900.00
> {code}
> Does not work currently.
> Here escaping inside the string field generates an error.
> For reference
> An interesting post about the fallacies that could be encountered when
> parsing CSV files.
> [http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)