Anurag Kumar Dwivedi created SPARK-57295:
--------------------------------------------

             Summary: [SQL] Database location validation is inconsistent for 
whitespace-only LOCATION values
                 Key: SPARK-57295
                 URL: https://issues.apache.org/jira/browse/SPARK-57295
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Anurag Kumar Dwivedi


h3. Description
h4. Problem

Spark validates empty database location URIs but does not consistently reject 
whitespace-only location values.

For example:
{{CREATE DATABASE db_empty LOCATION '';}}

fails during Spark analysis with:
{{INVALID_EMPTY_LOCATION}}

However, whitespace-only locations such as:
{{CREATE DATABASE db_space LOCATION ' ';}}
{{CREATE DATABASE db_tab LOCATION '\t';}}
{{CREATE DATABASE db_newline LOCATION '\n';}}

are not rejected by Spark validation.
h4. Observed Behavior
h5. Non-HMS Catalog Path

When running without Hive support, Spark only validates the empty string 
({{{}''{}}}).

The following statements succeed:
{{CREATE DATABASE db_space LOCATION ' ';}}
{{CREATE DATABASE db_tab LOCATION '\t';}}
{{CREATE DATABASE db_newline LOCATION '\n';}}

and the database is successfully created.
h5. HMS Catalog Path

When Hive support is enabled and the request reaches Hive Metastore, Spark 
still does not reject whitespace-only locations during analysis.

Instead, the invalid location is eventually detected by Hive Metastore and 
fails with Hive-specific exceptions {{HiveException }}depending on the exact 
location value and metastore implementation.

As a result:
||Location Value||Non-HMS||HMS||
|{{''}}|Rejected by Spark ({{{}INVALID_EMPTY_LOCATION{}}})|Rejected|
|{{' '}}|Database created successfully|Fails in HMS|
|{{'\t'}}|Database created successfully|Fails in HMS|
|{{'\n'}}|Database created successfully|Fails in HMS|

This leads to inconsistent behavior for semantically empty location values.
h4. Root Cause

Current validation checks only for an empty string:
{{location.isEmpty}}

Whitespace-only values are therefore treated as valid locations and bypass 
Spark-side validation.
h4. Expected Behavior

Whitespace-only location values should be treated the same as empty strings and 
be rejected during Spark analysis before reaching the catalog or metastore 
layer.

The following statements should consistently fail:
{{CREATE DATABASE db_space LOCATION ' ';}}
{{CREATE DATABASE db_tab LOCATION '\t';}}
{{CREATE DATABASE db_newline LOCATION '\n';}}

with:
{{INVALID_EMPTY_LOCATION}}

regardless of whether Hive support is enabled.
h4. Benefits
 * Consistent behavior across HMS and non-HMS deployments.

 * Consistent Spark SQL error classes.

 * Earlier validation failure before metastore interaction.

 * Improved cross-catalog behavior and user experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to