[
https://issues.apache.org/jira/browse/SPARK-52854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sanford Ryza resolved SPARK-52854.
----------------------------------
Fix Version/s: 4.1.0
Resolution: Fixed
> Prevent setCurrentDatabase and setCurrentCatalog within Pipelines Python
> definition files
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-52854
> URL: https://issues.apache.org/jira/browse/SPARK-52854
> Project: Spark
> Issue Type: Sub-task
> Components: Declarative Pipelines
> Affects Versions: 4.1.0
> Reporter: Sandy Ryza
> Assignee: Jacky Wang
> Priority: Major
> Fix For: 4.1.0
>
>
> Setting the spark session default catalog and database is an imperative
> construct that can cause friction and unexpected behavior from within a
> pipeline declaration. E.g. it makes pipeline behavior sensitive to the order
> that Python files are imported in, which can be unpredictable. There are
> already existing mechanisms for setting Spark catalog and database for
> pipelines:
> * The catalog and database settings in the pipeline spec
> * The name argument on the dataset decorators accepts a fully-qualified name
> Raising an error when someone tries to invoke to set a catalog or database in
> this situation would avoid this unpredictable behavior.
>
> The ways to set the catalog and database from Python are:
> * spark.catalog.setCurrentCatalog
> * spark.sql("USE CATALOG")
> * spark.catalog.setCurrentDatabase
> * spark.sql("USE DATABASE")
> The spark.sql usages will be covered by a separate JIRA.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]