yihua commented on code in PR #10865:
URL: https://github.com/apache/hudi/pull/10865#discussion_r1529188995
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java:
##########
@@ -112,10 +110,15 @@ public S3EventsHoodieIncrSource(
QueryRunner queryRunner,
CloudDataFetcher cloudDataFetcher) {
super(props, sparkContext, sparkSession, schemaProvider);
+
+ if (getBooleanWithAltKeys(props, ENABLE_EXISTS_CHECK)) {
+ sparkSession.conf().set("spark.sql.files.ignoreMissingFiles", "true");
+ sparkSession.conf().set("spark.sql.files.ignoreCorruptFiles", "true");
Review Comment:
See spark docs:
https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html#ignore-missing-files
`Spark allows you to use the configuration spark.sql.files.ignoreMissingFiles
or the data source option ignoreMissingFiles to ignore missing files while
reading data from files.`
You need to set `.option("ignoreMissingFiles")` to achieve the behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]