[
https://issues.apache.org/jira/browse/SPARK-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-14194.
----------------------------------
Resolution: Duplicate
I proposed to solve this via {{wholeFile}} option and it seems merged. I am
resolving this as a duplicate of that as that one has a PR.
> spark csv reader not working properly if CSV content contains CRLF character
> (newline) in the intermediate cell
> ---------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-14194
> URL: https://issues.apache.org/jira/browse/SPARK-14194
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2, 2.1.0
> Reporter: Kumaresh C R
>
> We have CSV content like below,
> Sl.NO, Employee_Name, Company, Address, Country, ZIP_Code\n\r
> "1", "ABCD", "XYZ", "1234", "XZ Street \n\r(CRLF charater),
> Municapality,....","USA", "1234567"
> Since there is a '\n\r' character in the row middle (to be exact in the
> Address Column), when we execute the below spark code, it tries to create the
> dataframe with two rows (excluding header row), which is wrong. Since we have
> specified delimiter as quote (") character , why it takes the middle
> character as newline character ? This creates an issue while processing the
> created dataframe.
> DataFrame df =
> sqlContextManager.getSqlContext().read().format("com.databricks.spark.csv")
> .option("header", "true")
> .option("inferSchema", "true")
> .option("delimiter", delim)
> .option("quote", quote)
> .option("escape", escape)
> .load(sourceFile);
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]