[jira] [Updated] (SPARK-22455) Provide an option to store the exception records/files and reasons in log files when reading data from a file-based data source.

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:20:43 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-22455:
---------------------------------
    Labels: bulk-closed  (was: )

> Provide an option to store the exception records/files and reasons in log 
> files when reading data from a file-based data source.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22455
>                 URL: https://issues.apache.org/jira/browse/SPARK-22455
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.2.0
>            Reporter: Sreenath Chothar
>            Priority: Minor
>              Labels: bulk-closed
>
> Provide an option to store the exception/bad records and reasons in log files 
> when reading data from a file-based data source into a PySpark dataframe. Now 
> only following three options are available:
> 1. PERMISSIVE : sets other fields to null when it meets a corrupted record, 
> and puts the malformed string into a field configured by 
> columnNameOfCorruptRecord.
> 2. DROPMALFORMED : ignores the whole corrupted records.
> 3. FAILFAST : throws an exception when it meets corrupted records.
> We could use first option to accumulate the corrupted records and output to a 
> log file.But we can't use this option when input schema is inferred 
> automatically. If the number of columns to read is too large, providing the 
> complete schema with additional column for storing corrupted data is 
> difficult. Instead "pyspark.sql.DataFrameReader.csv" reader functions could 
> provide an option to redirect the bad records to configured log file path 
> with exception details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-22455) Provide an option to store the exception records/files and reasons in log files when reading data from a file-based data source.

Reply via email to