[
https://issues.apache.org/jira/browse/SPARK-55452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Dvorzhak updated SPARK-55452:
----------------------------------
Description:
MLWriter.overwrite() call fails for PySpark Connect Sessions:
{code:java}
self = <xgboost.spark.core.SparkXGBModelWriter object at 0x7f81b0eaf620>, path
= 'file:/tmp/tmphb6f4hc7'
def _handleOverwrite(self, path: str) -> None:
from pyspark.ml.wrapper import JavaWrapper
_java_obj =
JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
wrapper = JavaWrapper(_java_obj)
> wrapper._call_java("handleOverwrite", path, True,
> self.sparkSession._jsparkSession)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.13/site-packages/pyspark/ml/util.py:579:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pyspark.sql.connect.session.SparkSession object at 0x7f81b1163cb0>,
name = '_jsparkSession'
def {}getattr{}(self, name: str) -> Any:
if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext",
"newSession"]:
> raise PySparkAttributeError(
errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", messageParameters=
{"attr_name": name}
)
E pyspark.errors.exceptions.base.PySparkAttributeError:
[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jsparkSession` is not supported in
Spark Connect as it depends on the JVM. If you need to use this attribute, do
not use Spark Connect when creating your session. Visit
https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession
for creating regular Spark Session in detail.
.venv/lib/python3.13/site-packages/pyspark/sql/connect/session.py:1000:
PySparkAttributeError
{code}
was:
`MLWriter.overwrite()` call fails for PySpark Connect Sessions:
```
self = <xgboost.spark.core.SparkXGBModelWriter object at 0x7f81b0eaf620>, path
= 'file:/tmp/tmphb6f4hc7'
def _handleOverwrite(self, path: str) -> None:
from pyspark.ml.wrapper import JavaWrapper
_java_obj =
JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
wrapper = JavaWrapper(_java_obj)
> wrapper._call_java("handleOverwrite", path, True,
> self.sparkSession._jsparkSession)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.13/site-packages/pyspark/ml/util.py:579:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pyspark.sql.connect.session.SparkSession object at 0x7f81b1163cb0>,
name = '_jsparkSession'
def _{_}getattr{_}_(self, name: str) -> Any:
if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext",
"newSession"]:
> raise PySparkAttributeError(
errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", messageParameters=
{"attr_name": name}
)
E pyspark.errors.exceptions.base.PySparkAttributeError:
[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jsparkSession` is not supported in
Spark Connect as it depends on the JVM. If you need to use this attribute, do
not use Spark Connect when creating your session. Visit
[https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession]
for creating regular Spark Session in detail.
.venv/lib/python3.13/site-packages/pyspark/sql/connect/session.py:1000:
PySparkAttributeError
```
> PySpark Connect MLWriter does not support overwrite()
> -----------------------------------------------------
>
> Key: SPARK-55452
> URL: https://issues.apache.org/jira/browse/SPARK-55452
> Project: Spark
> Issue Type: Bug
> Components: Connect, ML, PySpark
> Affects Versions: 4.0.2, 4.2.0, 4.1.1
> Reporter: Igor Dvorzhak
> Priority: Major
>
> MLWriter.overwrite() call fails for PySpark Connect Sessions:
> {code:java}
> self = <xgboost.spark.core.SparkXGBModelWriter object at 0x7f81b0eaf620>,
> path = 'file:/tmp/tmphb6f4hc7'
> def _handleOverwrite(self, path: str) -> None:
> from pyspark.ml.wrapper import JavaWrapper
> _java_obj =
> JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
> wrapper = JavaWrapper(_java_obj)
> > wrapper._call_java("handleOverwrite", path, True,
> > self.sparkSession._jsparkSession)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> .venv/lib/python3.13/site-packages/pyspark/ml/util.py:579:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> self = <pyspark.sql.connect.session.SparkSession object at 0x7f81b1163cb0>,
> name = '_jsparkSession'
> def {}getattr{}(self, name: str) -> Any:
> if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession", "sparkContext",
> "newSession"]:
> > raise PySparkAttributeError(
> errorClass="JVM_ATTRIBUTE_NOT_SUPPORTED", messageParameters=
> {"attr_name": name}
> )
> E pyspark.errors.exceptions.base.PySparkAttributeError:
> [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jsparkSession` is not supported in
> Spark Connect as it depends on the JVM. If you need to use this attribute, do
> not use Spark Connect when creating your session. Visit
> https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession
> for creating regular Spark Session in detail.
> .venv/lib/python3.13/site-packages/pyspark/sql/connect/session.py:1000:
> PySparkAttributeError
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]