umehrot2 commented on code in PR #6154:
URL: https://github.com/apache/hudi/pull/6154#discussion_r929195103
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -56,6 +56,9 @@ class DefaultSource extends RelationProvider
// Enable "passPartitionByAsOptions" to support "write.partitionBy(...)"
spark.conf.set("spark.sql.legacy.sources.write.passPartitionByAsOptions",
"true")
}
+ // Revisit EMR Spark and EMRFS incompatibilities, for now disable
+ spark.conf.set("spark.sql.dataPrefetch.enabled", "false")
+
spark.sparkContext.hadoopConfiguration.set("fs.s3.metadata.cache.expiration.seconds",
"0")
Review Comment:
Well the only reason we did this is because we want to reduce the noise, for
customers having to pass additional configurations just to make things work on
EMR. We cannot store in EMR Hudi configs, because as of now the global Hudi
confs that we support only work for Hudi related configurations. We cannot pass
spark/hadoop configs in them.
If you guys have concerns about this, we can revert it and instead have it
in the documentation that customers should explicitly pass these when running
open source bundle on EMR. Its just that it is not a good experience.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]