[
https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435539#comment-15435539
]
Andrew Ash commented on SPARK-17227:
------------------------------------
Rob and I work together, and we've seen datasets in mostly-CSV format that have
non-standard record delimiters ('\0' character for instance).
For some broader context, we've created our own CSV text parser and use that in
all our various internal products that use Spark, but would like to contribute
this additional flexibility back to the Spark community at large and in the
process eliminate the need for our internal CSV datasource.
Here are the tickets Rob just opened that we would require to eliminate our
internal CSV datasource:
SPARK-17222
SPARK-17224
SPARK-17225
SPARK-17226
SPARK-17227
The basic question then, is would the Spark community accept patches that
extend Spark's CSV parser to cover these features? We're willing to write the
code and get the patches through code review, but would rather know up front if
these changes would never be accepted into mainline Spark due to philosophical
disagreements around what Spark's CSV datasource should be.
> Allow configuring record delimiter in csv
> -----------------------------------------
>
> Key: SPARK-17227
> URL: https://issues.apache.org/jira/browse/SPARK-17227
> Project: Spark
> Issue Type: Improvement
> Reporter: Robert Kruszewski
> Priority: Minor
>
> Instead of hard coded "\n"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]