[jira] [Created] (SPARK-22329) Use NEVER_INFER for `spark.sql.hive.caseSensitiveInferenceMode` by default

Dongjoon Hyun (JIRA) Sun, 22 Oct 2017 10:58:14 -0700

Dongjoon Hyun created SPARK-22329:
-------------------------------------

             Summary: Use NEVER_INFER for 
`spark.sql.hive.caseSensitiveInferenceMode` by default
                 Key: SPARK-22329
                 URL: https://issues.apache.org/jira/browse/SPARK-22329
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Dongjoon Hyun
            Priority: Critical



In Spark 2.2.0, `spark.sql.hive.caseSensitiveInferenceMode` has a critical 
issue. 

- SPARK-19611 uses `INFER_AND_SAVE` at 2.2.0 since Spark 2.1.0 breaks some Hive 
tables backed by case-sensitive data files.
bq. This situation will occur for any Hive table that wasn't created by Spark 
or that was created prior to Spark 2.1.0. If a user attempts to run a query 
over such a table containing a case-sensitive field name in the query 
projection or in the query filter, the query will return 0 results in every 
case.

- However, SPARK-22306 reports this also corrupts Hive Metastore schema by 
removing bucketing information (BUCKETING_COLS, SORT_COLS) and changing owner.

- Since Spark 2.3.0 supports Bucketing, BUCKETING_COLS and SORT_COLS look okay 
at least. However, we need to figure out the issue of changing owners. Also, we 
cannot backport bucketing patch into `branch-2.2`. We need more tests on before 
releasing 2.3.0.

Hive Metastore is a shared resource and Spark should not corrupt it by default. 
This issue proposes to recover that option back to `NEVER_INFO` like Spark 
2.2.0 by default. Users can take a risk by enabling `INFER_AND_SAVE` by 
themselves.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-22329) Use NEVER_INFER for `spark.sql.hive.caseSensitiveInferenceMode` by default

Reply via email to