This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 888fb67699ca [SPARK-55071][CONNECT][PYTHON] Make spark.addArtifact 
work with Windows paths
888fb67699ca is described below

commit 888fb67699ca936ef302b1924e8e6fa63dd68b34
Author: Alex Khakhlyuk <[email protected]>
AuthorDate: Tue Jan 20 18:22:40 2026 +0100

    [SPARK-55071][CONNECT][PYTHON] Make spark.addArtifact work with Windows 
paths
    
    ### What changes were proposed in this pull request?
    
    Currently, `spark.addArtifact` in pyspark connect does not support absolute 
Windows paths.
    E.g. this code
    ```
    spark.addArtifact("C:\\path\\to\\file.py", pyfile=True)
    ```
    will result in the following error
    ```
    PySparkRuntimeError: [UNSUPPORTED_OPERATION] c scheme is not supported.
    ```
    This error is caused by the `urlparse` function in 
[artifact.py.](https://github.com/apache/spark/blob/ac13473fff64919e8e7756e3a42ce3a68627dd73/python/pyspark/sql/connect/client/artifact.py#L188).
 It incorrectly interprets local Windows path, e.g. `C:\\path\\to\\file`, as a 
URI with 'C' scheme and throws an error because this URI scheme is not known 
and not supported.
    
    The fix is to detect absolute windows paths before calling `urlparse`, to 
convert them to a `file://` URI, for example 
`"C:\\Users\\alex.khakhlyuk\\test.py"` -> 
`"file:///C:/Users/alex.khakhlyuk/test.py"`, and to proceed parsing the `URI` 
with urlparse as for regular paths and URIs.
    
    ### Why are the changes needed?
    
    `spark.addArtifact` currently doesn't support absolute Windows paths, this 
should be fixed.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    I ran a test locally on a windows machine.
    ```
    bin/pyspark --remote "local[*]"
    spark.addArtifact("C:\\Users\\alex.khakhlyuk\\test.py", pyfile=True)
    ```
    before:
    ```
    PySparkRuntimeError: [UNSUPPORTED_OPERATION] c scheme is not supported.
    ```
    after:
    - artifact upload succeeds
    
    It is quite tricky to write a test for windows path handling that will run 
on a linux CI machine.
    You can't create a real absolute Windows path on a Linux machine. If you 
run tests with a non-existent absolute Windows path, 
`Path(windows_path).resolve().as_uri()` will intrerpret it as a relative path 
and will resolve it to `{cwd}/C:\\...` which also doesn't exist.
    
    I could create some test with a bunch of mocking but I doubt how useful 
that would be.
    
    Existing tests make sure that we don't break existing behaviour.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #53834 from khakhlyuk/fix-addartifact-on-windows.
    
    Authored-by: Alex Khakhlyuk <[email protected]>
    Signed-off-by: Herman van Hövell <[email protected]>
---
 python/pyspark/sql/connect/client/artifact.py | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/connect/client/artifact.py 
b/python/pyspark/sql/connect/client/artifact.py
index 72a6ffa8bf68..a37642186fda 100644
--- a/python/pyspark/sql/connect/client/artifact.py
+++ b/python/pyspark/sql/connect/client/artifact.py
@@ -29,7 +29,7 @@ import zlib
 from itertools import chain
 from typing import List, Iterable, BinaryIO, Iterator, Optional, Tuple
 import abc
-from pathlib import Path
+from pathlib import Path, PureWindowsPath
 from urllib.parse import urlparse
 from urllib.request import url2pathname
 from functools import cached_property
@@ -184,7 +184,17 @@ class ArtifactManager:
     def _parse_artifacts(
         self, path_or_uri: str, pyfile: bool, archive: bool, file: bool
     ) -> List[Artifact]:
-        # Currently only local files with .jar extension is supported.
+        # Handle Windows absolute paths (e.g., C:\path\to\file) which urlparse
+        # incorrectly interprets as having URI scheme 'C' instead of being a 
local path.
+        # First check if path_or_uri is a Windows path, if so, convert it to 
file:// URI.
+        try:
+            win_path = PureWindowsPath(path_or_uri)
+            if win_path.is_absolute() and win_path.drive:
+                # Convert Windows path to file:// URI so urlparse handles it 
correctly
+                path_or_uri = Path(path_or_uri).resolve().as_uri()
+        except Exception:
+            pass
+
         parsed = urlparse(path_or_uri)
         # Check if it is a file from the scheme
         if parsed.scheme == "":


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to