Anurag Kumar Dwivedi created SPARK-57295:
--------------------------------------------
Summary: [SQL] Database location validation is inconsistent for
whitespace-only LOCATION values
Key: SPARK-57295
URL: https://issues.apache.org/jira/browse/SPARK-57295
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0
Reporter: Anurag Kumar Dwivedi
h3. Description
h4. Problem
Spark validates empty database location URIs but does not consistently reject
whitespace-only location values.
For example:
{{CREATE DATABASE db_empty LOCATION '';}}
fails during Spark analysis with:
{{INVALID_EMPTY_LOCATION}}
However, whitespace-only locations such as:
{{CREATE DATABASE db_space LOCATION ' ';}}
{{CREATE DATABASE db_tab LOCATION '\t';}}
{{CREATE DATABASE db_newline LOCATION '\n';}}
are not rejected by Spark validation.
h4. Observed Behavior
h5. Non-HMS Catalog Path
When running without Hive support, Spark only validates the empty string
({{{}''{}}}).
The following statements succeed:
{{CREATE DATABASE db_space LOCATION ' ';}}
{{CREATE DATABASE db_tab LOCATION '\t';}}
{{CREATE DATABASE db_newline LOCATION '\n';}}
and the database is successfully created.
h5. HMS Catalog Path
When Hive support is enabled and the request reaches Hive Metastore, Spark
still does not reject whitespace-only locations during analysis.
Instead, the invalid location is eventually detected by Hive Metastore and
fails with Hive-specific exceptions {{HiveException }}depending on the exact
location value and metastore implementation.
As a result:
||Location Value||Non-HMS||HMS||
|{{''}}|Rejected by Spark ({{{}INVALID_EMPTY_LOCATION{}}})|Rejected|
|{{' '}}|Database created successfully|Fails in HMS|
|{{'\t'}}|Database created successfully|Fails in HMS|
|{{'\n'}}|Database created successfully|Fails in HMS|
This leads to inconsistent behavior for semantically empty location values.
h4. Root Cause
Current validation checks only for an empty string:
{{location.isEmpty}}
Whitespace-only values are therefore treated as valid locations and bypass
Spark-side validation.
h4. Expected Behavior
Whitespace-only location values should be treated the same as empty strings and
be rejected during Spark analysis before reaching the catalog or metastore
layer.
The following statements should consistently fail:
{{CREATE DATABASE db_space LOCATION ' ';}}
{{CREATE DATABASE db_tab LOCATION '\t';}}
{{CREATE DATABASE db_newline LOCATION '\n';}}
with:
{{INVALID_EMPTY_LOCATION}}
regardless of whether Hive support is enabled.
h4. Benefits
* Consistent behavior across HMS and non-HMS deployments.
* Consistent Spark SQL error classes.
* Earlier validation failure before metastore interaction.
* Improved cross-catalog behavior and user experience.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]