Marcin Mejran created SPARK-27873:
-------------------------------------
Summary: Csv reader, adding a corrupt record column causes error
if enforceSchema=false
Key: SPARK-27873
URL: https://issues.apache.org/jira/browse/SPARK-27873
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.3
Reporter: Marcin Mejran
In the Spark CSV reader If you're using permissive mode with a column for
storing corrupt records then you need to add a new schema column corresponding
to columnNameOfCorruptRecord.
However, if you have a header row and enforceSchema=false the schema vs. header
validation fails because there is an extra column corresponding to
columnNameOfCorruptRecord.
Since, the FAILFAST mode doesn't print informative error messages on which rows
failed to parse there is no way other to track down broken rows without setting
a corrupt record column.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]