Asad Shaikh created SPARK-49825:
-----------------------------------

             Summary: default value of `spark.sql.warehouse.dir` is not decoded 
correctly when saving table
                 Key: SPARK-49825
                 URL: https://issues.apache.org/jira/browse/SPARK-49825
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.3
         Environment: macOS 15.0
            Reporter: Asad Shaikh


I haven't looked into how _general_ this problem is, but here's a very specific 
scenario which I ran into last night.

 

When the `{{{}SparkSession{}}}` is created _without_ specifying the config 
`{{{}spark.sql.warehouse.sql{}}}`, the default value is _cwd/spark-warehouse_ 
and this path seems URL-encoded when printed via 
`spark.conf.get('spark.sql.warehouse.dir')`.

So, for instance, if any spaces were present in the path, they will be replaced 
by "%20".

If this is the case, then the path should be decoded whenever necessary, but it 
turns out this encoded path is taken literally and consequently spark writes 
tables to a different location than intended.

 

here's a minimal snippet to reproduce:

```py

{{from pyspark.sql import SparkSession}}

{{spark = SparkSession.builder.getOrCreate()}}
 
{{spark.conf.get('spark.sql.warehouse.dir') # 
'file:/Users/user/cwd%20with%20space/spark-warehouse'}}
 
{{df = ...}}
{{df.write.saveAsTable('df') # table will be saved at 
/Users/user/cwd%20with%20space/spark-warehouse}}
```

 

Interestingly, this doesn't happen if the path is manually specified when 
creating the session. Even if the path is literally the same as what spark 
would've taken by-default.

 

```py

{{from pyspark.sql import SparkSession}}

{{{}spark = SparkSession.builder.config('{}}}spark.sql.warehouse.dir', 
'spark-warehouse/').getOrCreate()

 
{{spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with 
space/spark-warehouse'}}
```

 

The above works fine.

 

PS. plz forgive me if this is supposed to happen by design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to