This is an automated email from the ASF dual-hosted git repository.
yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 2698d6bf10b [SPARK-40838][INFRA][TESTS] Upgrade infra base image to
focal-20220922 and fix ps.mlflow doctest
2698d6bf10b is described below
commit 2698d6bf10b92e71e8af88fedb4e7c9e0f304416
Author: Yikun Jiang <[email protected]>
AuthorDate: Thu Oct 20 15:54:18 2022 +0800
[SPARK-40838][INFRA][TESTS] Upgrade infra base image to focal-20220922 and
fix ps.mlflow doctest
### What changes were proposed in this pull request?
Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest
### Why are the changes needed?
- Upgrade infra base image to `focal-20220922` (Ubuntu 20.04 currently
latest)
- Infra Image Python version updated.
- numpy 1.23.3 --> 1.23.4
- mlflow 1.28.0 --> 1.29.0
- matplotlib 3.5.3 --> 3.6.1
- pip 22.2.2 --> 22.3
- scipy 1.9.1 --> 1.9.3
Full list: https://www.diffchecker.com/e6eZZaYn
- Fix ps.mlfow doctest (due to mlflow upgrade):
```
**********************************************************************
File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 158, in
pyspark.pandas.mlflow.load_model
Failed example:
with mlflow.start_run():
lr = LinearRegression()
lr.fit(train_x, train_y)
mlflow.sklearn.log_model(lr, "model")
Expected:
LinearRegression(...)
Got:
LinearRegression()
<mlflow.models.model.ModelInfo object at 0x7fef9578deb0>
```
### Does this PR introduce _any_ user-facing change?
No, dev only
### How was this patch tested?
All CI passed
Closes #38304 from Yikun/SPARK-40838.
Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>
---
dev/infra/Dockerfile | 4 ++--
python/pyspark/pandas/mlflow.py | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index ccf0c932b0e..2a70bd3f98f 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -17,9 +17,9 @@
# Image for building and testing Spark branches. Based on Ubuntu 20.04.
# See also in https://hub.docker.com/_/ubuntu
-FROM ubuntu:focal-20220801
+FROM ubuntu:focal-20220922
-ENV FULL_REFRESH_DATE 20220706
+ENV FULL_REFRESH_DATE 20221019
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
diff --git a/python/pyspark/pandas/mlflow.py b/python/pyspark/pandas/mlflow.py
index 094215743e2..469349b37ee 100644
--- a/python/pyspark/pandas/mlflow.py
+++ b/python/pyspark/pandas/mlflow.py
@@ -159,7 +159,7 @@ def load_model(
... lr = LinearRegression()
... lr.fit(train_x, train_y)
... mlflow.sklearn.log_model(lr, "model")
- LinearRegression(...)
+ LinearRegression...
Now that our model is logged using MLflow, we load it back and apply it on
a pandas-on-Spark
dataframe:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]