[ https://issues.apache.org/jira/browse/SPARK-38639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513318#comment-17513318 ]
Apache Spark commented on SPARK-38639: -------------------------------------- User 'TonyDoen' has created a pull request for this issue: https://github.com/apache/spark/pull/35990 > Support ignoreCorruptRecord flag to ensure querying broken sequence file > table smoothly > --------------------------------------------------------------------------------------- > > Key: SPARK-38639 > URL: https://issues.apache.org/jira/browse/SPARK-38639 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.2, 3.2.1 > Reporter: tonydoen > Priority: Minor > Fix For: 3.2.1 > > Original Estimate: 48h > Remaining Estimate: 48h > > There's an existing flag "spark.sql.files.ignoreCorruptFiles" and > "spark.sql.files.ignoreMissingFiles" that will quietly ignore attempted reads > from files that have been corrupted, but it still allows the query to fail on > sequence files. > > Being able to ignore corrupt record is useful in the scenarios that users > want to query successfully in dirty data(mixed schema in one table). > > We would like to add a "spark.sql.hive.ignoreCorruptRecord" to fill out the > functionality. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org