[ 
https://issues.apache.org/jira/browse/SPARK-52854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanford Ryza resolved SPARK-52854.
----------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

> Prevent setCurrentDatabase and setCurrentCatalog within Pipelines Python 
> definition files
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-52854
>                 URL: https://issues.apache.org/jira/browse/SPARK-52854
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Declarative Pipelines
>    Affects Versions: 4.1.0
>            Reporter: Sandy Ryza
>            Assignee: Jacky Wang
>            Priority: Major
>             Fix For: 4.1.0
>
>
> Setting the spark session default catalog and database is an imperative 
> construct that can cause friction and unexpected behavior from within a 
> pipeline declaration. E.g. it makes pipeline behavior sensitive to the order 
> that Python files are imported in, which can be unpredictable. There are 
> already existing mechanisms for setting Spark catalog and database for 
> pipelines:
>  * The catalog and database settings in the pipeline spec
>  * The name argument on the dataset decorators accepts a fully-qualified name
> Raising an error when someone tries to invoke to set a catalog or database in 
> this situation would avoid this unpredictable behavior.
>  
> The ways to set the catalog and database from Python are:
>  * spark.catalog.setCurrentCatalog
>  * spark.sql("USE CATALOG")
>  * spark.catalog.setCurrentDatabase
>  * spark.sql("USE DATABASE")
>  The spark.sql usages will be covered by a separate JIRA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to