[jira] [Commented] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

Wenchen Fan (JIRA) Sat, 28 Oct 2017 10:30:57 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223667#comment-16223667
 ]


Wenchen Fan commented on SPARK-21158:
-------------------------------------

I think this is a reasonable feature request, i.e. making 
{{Catalog.listTables}} case preserving. However it needs to change how Spark 
SQL implements case sensitivity, which is really a big change. I'd like to mark 
this ticket as "later" because the benefit is small here and we may not have 
time to do it recently. Any objections? cc [~smilegator] [~srowen]

> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark 
> setting for case-sensitivity
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21158
>                 URL: https://issues.apache.org/jira/browse/SPARK-21158
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Windows 10
> IntelliJ 
> Scala
>            Reporter: Kathryn McClintic
>            Priority: Minor
>              Labels: easyfix, features, sparksql, windows
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues 
> with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the 
> table names in the way it was provided. This is correct.
> If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the 
> table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in 
> our DB. We check if this list contains(<"string of table name">) to determine 
> if we have already created a table. If case-sensitivity is turned off 
> (false), this function should look if the table name is contained in the 
> table list regardless of case.
> However, it tries to look for only ones that match the lower case version of 
> the stored table. Therefore, if you pass in a camel or upper case table name, 
> this function would return false when in fact the table does exist.
> The root cause of this issue is in the function 
> SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have  four tables and you have chosen to have 
> spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the 
> temp join table of 'carnamesandmodels'. However, you have stored it as a 
> constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will 
> currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be 
> lowercase should not be put on the spark user. This should be done by spark 
> sql if case-sensitivity is turned to false.
> Proposed solutions:
> -     Setting case sensitive in the sql context should make the sql context 
> be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts 
> the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the 
> input format instead of in all lowercase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

Reply via email to