SemyonSinchenko commented on issue #1079: URL: https://github.com/apache/datafusion-comet/issues/1079#issuecomment-2474442442
The problem here is that a `spark.plugins` config is static, so all the plugins should be specified before the PySpark SparkSession is created. If users are going to run any comet job with spark-submit of the Python script, they should specify JARs location and plugins manually. And the process will be the same as submitting JVM Spark job. In my experience it is a very rare case when users create a SparkSession inside a Python script and run such a script as a Python file, not via spark-submit. The same story applies to a Databricks (or Databricks-like) Python notebook, for example: SparkSession already exists inside, so because of the `spark.plugins` it is impossible to add Comet to the already running session. The only way to run Comet in Databricks is by using Databricks so called `init scripts` that are executed before the cluster is created. In this case, users should manually add the Comet JAR to the spark jars folder and specify the `plugin`. Another case is PySpark connect, but in that case comet JARs should be on the server-side and any pip-installable package cannot help there anyhow... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
