I discovered today that EMR provides its own optimizations for Spark <https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-performance.html>. Some of these optimizations are controlled by configuration settings with names like `spark.sql.dynamicPartitionPruning.enabled` or `spark.sql.optimizer.flattenScalarSubqueriesWithAggregates.enabled`. As far as I can tell <http://spark.apache.org/docs/latest/configuration.html>, these are EMR-specific configurations.
Does this create a potential problem, since it's possible that future Apache Spark configuration settings may end up colliding with these names selected by EMR? Should we document some sort of third-party configuration namespace pattern and encourage third parties to scope their custom configurations to that area? e.g. Something like `spark.external.[vendor].[whatever]`. Nick