[ 
https://issues.apache.org/jira/browse/SPARK-55452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated SPARK-55452:
----------------------------------
    Description: 
MLWriter.overwrite() call fails for PySpark Connect Sessions:
{code:java}
self = <xgboost.spark.core.SparkXGBModelWriter object at 0x7f81b0eaf620>, path 
= 'file:/tmp/tmphb6f4hc7'
def _handleOverwrite(self, path: str) -> None:
from pyspark.ml.wrapper import JavaWrapper
_java_obj = 
JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
wrapper = JavaWrapper(_java_obj)
> wrapper._call_java("handleOverwrite", path, True, 
> self.sparkSession._jsparkSession)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.13/site-packages/pyspark/ml/util.py:579: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pyspark.sql.connect.session.SparkSession object at 0x7f81b1163cb0>, 
name = '_jsparkSession'
def {}getattr{}(self, name: str) -> Any:
if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext", 
"newSession"]:
> raise PySparkAttributeError(
errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", messageParameters=
{"attr_name": name}
)
E pyspark.errors.exceptions.base.PySparkAttributeError: 
[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jsparkSession` is not supported in 
Spark Connect as it depends on the JVM. If you need to use this attribute, do 
not use Spark Connect when creating your session. Visit 
https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession
 for creating regular Spark Session in detail.
.venv/lib/python3.13/site-packages/pyspark/sql/connect/session.py:1000: 
PySparkAttributeError
{code}

  was:
`MLWriter.overwrite()` call fails for PySpark Connect Sessions:

 

```
self = <xgboost.spark.core.SparkXGBModelWriter object at 0x7f81b0eaf620>, path 
= 'file:/tmp/tmphb6f4hc7'

def _handleOverwrite(self, path: str) -> None:
from pyspark.ml.wrapper import JavaWrapper

_java_obj = 
JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
wrapper = JavaWrapper(_java_obj)
> wrapper._call_java("handleOverwrite", path, True, 
> self.sparkSession._jsparkSession)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.venv/lib/python3.13/site-packages/pyspark/ml/util.py:579: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pyspark.sql.connect.session.SparkSession object at 0x7f81b1163cb0>, 
name = '_jsparkSession'

def _{_}getattr{_}_(self, name: str) -> Any:
if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext", 
"newSession"]:
> raise PySparkAttributeError(
errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", messageParameters=

{"attr_name": name}

)
E pyspark.errors.exceptions.base.PySparkAttributeError: 
[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jsparkSession` is not supported in 
Spark Connect as it depends on the JVM. If you need to use this attribute, do 
not use Spark Connect when creating your session. Visit 
[https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession]
 for creating regular Spark Session in detail.

.venv/lib/python3.13/site-packages/pyspark/sql/connect/session.py:1000: 
PySparkAttributeError
```


> PySpark Connect MLWriter does not support overwrite()
> -----------------------------------------------------
>
>                 Key: SPARK-55452
>                 URL: https://issues.apache.org/jira/browse/SPARK-55452
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect, ML, PySpark
>    Affects Versions: 4.0.2, 4.2.0, 4.1.1
>            Reporter: Igor Dvorzhak
>            Priority: Major
>
> MLWriter.overwrite() call fails for PySpark Connect Sessions:
> {code:java}
> self = <xgboost.spark.core.SparkXGBModelWriter object at 0x7f81b0eaf620>, 
> path = 'file:/tmp/tmphb6f4hc7'
> def _handleOverwrite(self, path: str) -> None:
> from pyspark.ml.wrapper import JavaWrapper
> _java_obj = 
> JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
> wrapper = JavaWrapper(_java_obj)
> > wrapper._call_java("handleOverwrite", path, True, 
> > self.sparkSession._jsparkSession)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> .venv/lib/python3.13/site-packages/pyspark/ml/util.py:579: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> self = <pyspark.sql.connect.session.SparkSession object at 0x7f81b1163cb0>, 
> name = '_jsparkSession'
> def {}getattr{}(self, name: str) -> Any:
> if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext", 
> "newSession"]:
> > raise PySparkAttributeError(
> errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", messageParameters=
> {"attr_name": name}
> )
> E pyspark.errors.exceptions.base.PySparkAttributeError: 
> [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jsparkSession` is not supported in 
> Spark Connect as it depends on the JVM. If you need to use this attribute, do 
> not use Spark Connect when creating your session. Visit 
> https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession
>  for creating regular Spark Session in detail.
> .venv/lib/python3.13/site-packages/pyspark/sql/connect/session.py:1000: 
> PySparkAttributeError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to