Asad Shaikh created SPARK-49825:
-----------------------------------
Summary: default value of `spark.sql.warehouse.dir` is not decoded
correctly when saving table
Key: SPARK-49825
URL: https://issues.apache.org/jira/browse/SPARK-49825
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.5.3
Environment: macOS 15.0
Reporter: Asad Shaikh
I haven't looked into how _general_ this problem is, but here's a very specific
scenario which I ran into last night.
When the `{{{}SparkSession{}}}` is created _without_ specifying the config
`{{{}spark.sql.warehouse.sql{}}}`, the default value is _cwd/spark-warehouse_
and this path seems URL-encoded when printed via
`spark.conf.get('spark.sql.warehouse.dir')`.
So, for instance, if any spaces were present in the path, they will be replaced
by "%20".
If this is the case, then the path should be decoded whenever necessary, but it
turns out this encoded path is taken literally and consequently spark writes
tables to a different location than intended.
here's a minimal snippet to reproduce:
```py
{{from pyspark.sql import SparkSession}}
{{spark = SparkSession.builder.getOrCreate()}}
{{spark.conf.get('spark.sql.warehouse.dir') #
'file:/Users/user/cwd%20with%20space/spark-warehouse'}}
{{df = ...}}
{{df.write.saveAsTable('df') # table will be saved at
/Users/user/cwd%20with%20space/spark-warehouse}}
```
Interestingly, this doesn't happen if the path is manually specified when
creating the session. Even if the path is literally the same as what spark
would've taken by-default.
```py
{{from pyspark.sql import SparkSession}}
{{{}spark = SparkSession.builder.config('{}}}spark.sql.warehouse.dir',
'spark-warehouse/').getOrCreate()
{{spark.conf.get('spark.sql.warehouse.dir') # 'file:/Users/user/cwd with
space/spark-warehouse'}}
```
The above works fine.
PS. plz forgive me if this is supposed to happen by design
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]