This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 498b3ec9dd9 [SPARK-42106][PYTHON] Hide parameters when printing remote 
URL in REPL
498b3ec9dd9 is described below

commit 498b3ec9dd9f4f640f1d47388547ecb864f2cdfe
Author: Niranjan Jayakar <[email protected]>
AuthorDate: Thu Jan 19 09:37:49 2023 +0900

    [SPARK-42106][PYTHON] Hide parameters when printing remote URL in REPL
    
    ### What changes were proposed in this pull request?
    
    When the `pyspark` REPL is initialized with the `--remote <url>` option,
    the URL provided is logged out to stdout as part of the REPL startup.
    However, it may have auth tokens specified as query parameters.
    
    This change is to print the network location, i.e., host and port, instead
    of the full URL. Since Spark Connect servers will always be hosted at
    the root and the URL scheme is always expected to be `sc`, only the
    host and parts of the URL are relevant.
    
    ### Why are the changes needed?
    
    Security best practices require not logging auth tokens to standard
    output or any destination.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Previously -
    
    ```sh
    $ ./bin/pyspark --remote "sc://foo.com/?x=y"
    Python 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 3.4.0.dev0
          /_/
    
    Using Python version 3.10.8 (main, Nov 24 2022 08:08:27)
    Client connected to the Spark Connect server at sc://foo.com/?x=y
    ```
    
    Now -
    
    ```sh
    $ ./bin/pyspark --remote "sc://foo.com/?x=y"
    Python 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 3.4.0.dev0
          /_/
    
    Using Python version 3.10.8 (main, Nov 24 2022 08:08:27)
    Client connected to the Spark Connect server at foo.com
    ```
    
    ```sh
    $ ./bin/pyspark --remote "sc://foo.com:8080/?x=y"
    Python 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 3.4.0.dev0
          /_/
    
    Using Python version 3.10.8 (main, Nov 24 2022 08:08:27)
    Client connected to the Spark Connect server at foo.com:8080
    ```
    
    ### How was this patch tested?
    
    Manually tested by running the `pyspark` binary. See above.
    
    Closes #39641 from nija-at/sanitize-url.
    
    Authored-by: Niranjan Jayakar <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/shell.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/shell.py b/python/pyspark/shell.py
index c1c2d4faacd..8613d2d09ea 100644
--- a/python/pyspark/shell.py
+++ b/python/pyspark/shell.py
@@ -31,6 +31,8 @@ from pyspark.context import SparkContext
 from pyspark.sql import SparkSession
 from pyspark.sql.context import SQLContext
 from pyspark.sql.utils import is_remote
+from urllib.parse import urlparse
+
 
 if is_remote():
     try:
@@ -86,7 +88,10 @@ print(
     % (platform.python_version(), platform.python_build()[0], 
platform.python_build()[1])
 )
 if is_remote():
-    print("Client connected to the Spark Connect server at %s" % 
(os.environ["SPARK_REMOTE"]))
+    print(
+        "Client connected to the Spark Connect server at %s"
+        % urlparse(os.environ["SPARK_REMOTE"]).netloc
+    )
 else:
     print("Spark context Web UI available at %s" % (sc.uiWebUrl))  # type: 
ignore[union-attr]
     print(


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to