This is an automated email from the ASF dual-hosted git repository.
hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 888fb67699ca [SPARK-55071][CONNECT][PYTHON] Make spark.addArtifact
work with Windows paths
888fb67699ca is described below
commit 888fb67699ca936ef302b1924e8e6fa63dd68b34
Author: Alex Khakhlyuk <[email protected]>
AuthorDate: Tue Jan 20 18:22:40 2026 +0100
[SPARK-55071][CONNECT][PYTHON] Make spark.addArtifact work with Windows
paths
### What changes were proposed in this pull request?
Currently, `spark.addArtifact` in pyspark connect does not support absolute
Windows paths.
E.g. this code
```
spark.addArtifact("C:\\path\\to\\file.py", pyfile=True)
```
will result in the following error
```
PySparkRuntimeError: [UNSUPPORTED_OPERATION] c scheme is not supported.
```
This error is caused by the `urlparse` function in
[artifact.py.](https://github.com/apache/spark/blob/ac13473fff64919e8e7756e3a42ce3a68627dd73/python/pyspark/sql/connect/client/artifact.py#L188).
It incorrectly interprets local Windows path, e.g. `C:\\path\\to\\file`, as a
URI with 'C' scheme and throws an error because this URI scheme is not known
and not supported.
The fix is to detect absolute windows paths before calling `urlparse`, to
convert them to a `file://` URI, for example
`"C:\\Users\\alex.khakhlyuk\\test.py"` ->
`"file:///C:/Users/alex.khakhlyuk/test.py"`, and to proceed parsing the `URI`
with urlparse as for regular paths and URIs.
### Why are the changes needed?
`spark.addArtifact` currently doesn't support absolute Windows paths, this
should be fixed.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
I ran a test locally on a windows machine.
```
bin/pyspark --remote "local[*]"
spark.addArtifact("C:\\Users\\alex.khakhlyuk\\test.py", pyfile=True)
```
before:
```
PySparkRuntimeError: [UNSUPPORTED_OPERATION] c scheme is not supported.
```
after:
- artifact upload succeeds
It is quite tricky to write a test for windows path handling that will run
on a linux CI machine.
You can't create a real absolute Windows path on a Linux machine. If you
run tests with a non-existent absolute Windows path,
`Path(windows_path).resolve().as_uri()` will intrerpret it as a relative path
and will resolve it to `{cwd}/C:\\...` which also doesn't exist.
I could create some test with a bunch of mocking but I doubt how useful
that would be.
Existing tests make sure that we don't break existing behaviour.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53834 from khakhlyuk/fix-addartifact-on-windows.
Authored-by: Alex Khakhlyuk <[email protected]>
Signed-off-by: Herman van Hövell <[email protected]>
---
python/pyspark/sql/connect/client/artifact.py | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/sql/connect/client/artifact.py
b/python/pyspark/sql/connect/client/artifact.py
index 72a6ffa8bf68..a37642186fda 100644
--- a/python/pyspark/sql/connect/client/artifact.py
+++ b/python/pyspark/sql/connect/client/artifact.py
@@ -29,7 +29,7 @@ import zlib
from itertools import chain
from typing import List, Iterable, BinaryIO, Iterator, Optional, Tuple
import abc
-from pathlib import Path
+from pathlib import Path, PureWindowsPath
from urllib.parse import urlparse
from urllib.request import url2pathname
from functools import cached_property
@@ -184,7 +184,17 @@ class ArtifactManager:
def _parse_artifacts(
self, path_or_uri: str, pyfile: bool, archive: bool, file: bool
) -> List[Artifact]:
- # Currently only local files with .jar extension is supported.
+ # Handle Windows absolute paths (e.g., C:\path\to\file) which urlparse
+ # incorrectly interprets as having URI scheme 'C' instead of being a
local path.
+ # First check if path_or_uri is a Windows path, if so, convert it to
file:// URI.
+ try:
+ win_path = PureWindowsPath(path_or_uri)
+ if win_path.is_absolute() and win_path.drive:
+ # Convert Windows path to file:// URI so urlparse handles it
correctly
+ path_or_uri = Path(path_or_uri).resolve().as_uri()
+ except Exception:
+ pass
+
parsed = urlparse(path_or_uri)
# Check if it is a file from the scheme
if parsed.scheme == "":
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]