[jira] [Commented] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

Liang-Chi Hsieh (JIRA) Thu, 30 May 2019 07:49:31 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-27873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851952#comment-16851952
 ]


Liang-Chi Hsieh commented on SPARK-27873:
-----------------------------------------

I can prepare a PR if Marcin or Hyukjin Kwon don't plan to do.

> Csv reader, adding a corrupt record column causes error if enforceSchema=false
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-27873
>                 URL: https://issues.apache.org/jira/browse/SPARK-27873
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.3
>            Reporter: Marcin Mejran
>            Priority: Major
>
> In the Spark CSV reader If you're using permissive mode with a column for 
> storing corrupt records then you need to add a new schema column 
> corresponding to columnNameOfCorruptRecord.
> However, if you have a header row and enforceSchema=false the schema vs. 
> header validation fails because there is an extra column corresponding to 
> columnNameOfCorruptRecord.
> Since, the FAILFAST mode doesn't print informative error messages on which 
> rows failed to parse there is no way other to track down broken rows without 
> setting a corrupt record column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

Reply via email to