[
https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223667#comment-16223667
]
Wenchen Fan commented on SPARK-21158:
-------------------------------------
I think this is a reasonable feature request, i.e. making
{{Catalog.listTables}} case preserving. However it needs to change how Spark
SQL implements case sensitivity, which is really a big change. I'd like to mark
this ticket as "later" because the benefit is small here and we may not have
time to do it recently. Any objections? cc [~smilegator] [~srowen]
> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark
> setting for case-sensitivity
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-21158
> URL: https://issues.apache.org/jira/browse/SPARK-21158
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Environment: Windows 10
> IntelliJ
> Scala
> Reporter: Kathryn McClintic
> Priority: Minor
> Labels: easyfix, features, sparksql, windows
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues
> with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the
> table names in the way it was provided. This is correct.
> If you set spark.sql.caseSensitive setting to be false, SparkSQL stores the
> table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in
> our DB. We check if this list contains(<"string of table name">) to determine
> if we have already created a table. If case-sensitivity is turned off
> (false), this function should look if the table name is contained in the
> table list regardless of case.
> However, it tries to look for only ones that match the lower case version of
> the stored table. Therefore, if you pass in a camel or upper case table name,
> this function would return false when in fact the table does exist.
> The root cause of this issue is in the function
> SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have four tables and you have chosen to have
> spark.sql.case-Sensitive=false so it stores your tables in lowercase:
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the
> temp join table of 'carnamesandmodels'. However, you have stored it as a
> constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will
> currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be
> lowercase should not be put on the spark user. This should be done by spark
> sql if case-sensitivity is turned to false.
> Proposed solutions:
> - Setting case sensitive in the sql context should make the sql context
> be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts
> the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the
> input format instead of in all lowercase.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]