[
https://issues.apache.org/jira/browse/SPARK-57295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57295:
-----------------------------------
Labels: database namespace pull-request-available sql validation (was:
database namespace sql validation)
> [SQL] Database location validation is inconsistent for whitespace-only
> LOCATION values
> --------------------------------------------------------------------------------------
>
> Key: SPARK-57295
> URL: https://issues.apache.org/jira/browse/SPARK-57295
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Anurag Kumar Dwivedi
> Priority: Minor
> Labels: database, namespace, pull-request-available, sql,
> validation
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> h3. Description
> h4. Problem
> Spark validates empty database location URIs but does not consistently reject
> whitespace-only location values.
> For example:
> {{CREATE DATABASE db_empty LOCATION '';}}
> fails during Spark analysis with:
> {{INVALID_EMPTY_LOCATION}}
> However, whitespace-only locations such as:
> {{CREATE DATABASE db_space LOCATION ' ';}}
> {{CREATE DATABASE db_tab LOCATION '\t';}}
> {{CREATE DATABASE db_newline LOCATION '\n';}}
> are not rejected by Spark validation.
> h4. Observed Behavior
> h5. Non-HMS Catalog Path
> When running without Hive support, Spark only validates the empty string
> ({{{}''{}}}).
> The following statements succeed:
> {{CREATE DATABASE db_space LOCATION ' ';}}
> {{CREATE DATABASE db_tab LOCATION '\t';}}
> {{CREATE DATABASE db_newline LOCATION '\n';}}
> and the database is successfully created.
> h5. HMS Catalog Path
> When Hive support is enabled and the request reaches Hive Metastore, Spark
> still does not reject whitespace-only locations during analysis.
> Instead, the invalid location is eventually detected by Hive Metastore and
> fails with Hive-specific exceptions {{HiveException }}depending on the exact
> location value and metastore implementation.
> As a result:
> ||Location Value||Non-HMS||HMS||
> |{{''}}|Rejected by Spark ({{{}INVALID_EMPTY_LOCATION{}}})|Rejected|
> |{{' '}}|Database created successfully|Fails in HMS|
> |{{'\t'}}|Database created successfully|Fails in HMS|
> |{{'\n'}}|Database created successfully|Fails in HMS|
> This leads to inconsistent behavior for semantically empty location values.
> h4. Root Cause
> Current validation checks only for an empty string:
> {{location.isEmpty}}
> Whitespace-only values are therefore treated as valid locations and bypass
> Spark-side validation.
> h4. Expected Behavior
> Whitespace-only location values should be treated the same as empty strings
> and be rejected during Spark analysis before reaching the catalog or
> metastore layer.
> The following statements should consistently fail:
> {{CREATE DATABASE db_space LOCATION ' ';}}
> {{CREATE DATABASE db_tab LOCATION '\t';}}
> {{CREATE DATABASE db_newline LOCATION '\n';}}
> with:
> {{INVALID_EMPTY_LOCATION}}
> regardless of whether Hive support is enabled.
> h4. Benefits
> * Consistent behavior across HMS and non-HMS deployments.
> * Consistent Spark SQL error classes.
> * Earlier validation failure before metastore interaction.
> * Improved cross-catalog behavior and user experience.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]