spark git commit: [SPARK-22279][SQL] Enable `convertMetastoreOrc` by default

wenchen Wed, 09 May 2018 22:37:11 -0700

Repository: spark
Updated Branches:
  refs/heads/master 62d01391f -> e3d434994



[SPARK-22279][SQL] Enable `convertMetastoreOrc` by default

## What changes were proposed in this pull request?

We reverted `spark.sql.hive.convertMetastoreOrc` at 
https://github.com/apache/spark/pull/20536 because we should not ignore the 
table-specific compression conf. Now, it's resolved via 
[SPARK-23355](https://github.com/apache/spark/commit/8aa1d7b0ede5115297541d29eab4ce5f4fe905cb).

## How was this patch tested?

Pass the Jenkins.

Author: Dongjoon Hyun <dongj...@apache.org>

Closes #21186 from dongjoon-hyun/SPARK-24112.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e3d43499
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e3d43499
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e3d43499

Branch: refs/heads/master
Commit: e3d434994733ae16e7e1424fb6de2d22b1a13f99
Parents: 62d0139
Author: Dongjoon Hyun <dongj...@apache.org>
Authored: Thu May 10 13:36:52 2018 +0800
Committer: Wenchen Fan <wenc...@databricks.com>
Committed: Thu May 10 13:36:52 2018 +0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md                                     | 3 ++-
 sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala | 3 +--
 2 files changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e3d43499/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 3e8946e..3f79ed6 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1017,7 +1017,7 @@ the vectorized reader is used when 
`spark.sql.hive.convertMetastoreOrc` is also
   <tr><th><b>Property 
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
   <tr>
     <td><code>spark.sql.orc.impl</code></td>
-    <td><code>hive</code></td>
+    <td><code>native</code></td>
     <td>The name of ORC implementation. It can be one of <code>native</code> 
and <code>hive</code>. <code>native</code> means the native ORC support that is 
built on Apache ORC 1.4. `hive` means the ORC library in Hive 1.2.1.</td>
   </tr>
   <tr>
@@ -1813,6 +1813,7 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
   - Since Spark 2.4, the type coercion rules can automatically promote the 
argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest 
common type, no matter how the input arguments order. In prior Spark versions, 
the promotion could fail in some specific orders (e.g., TimestampType, 
IntegerType and StringType) and throw an exception.
   - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` 
respect the timezone in the input timestamp string, which breaks the assumption 
that the input timestamp is in a specific timezone. Therefore, these 2 
functions can return unexpected results. In version 2.4 and later, this problem 
has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if 
the input timestamp string contains timezone. As an example, 
`from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 
01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 
00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return 
`2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care 
about this problem and want to retain the previous behaivor to keep their query 
unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. 
This option will be removed in Spark 3.0 and should only be used as a temporary 
w
 orkaround.
   - In version 2.3 and earlier, Spark converts Parquet Hive tables by default 
but ignores table properties like `TBLPROPERTIES (parquet.compression 'NONE')`. 
This happens for ORC Hive table properties like `TBLPROPERTIES (orc.compress 
'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`, too. Since Spark 
2.4, Spark respects Parquet/ORC specific table properties while converting 
Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id int) STORED AS 
PARQUET TBLPROPERTIES (parquet.compression 'NONE')` would generate Snappy 
parquet files during insertion in Spark 2.3, and in Spark 2.4, the result would 
be uncompressed parquet files.
+  - Since Spark 2.0, Spark converts Parquet Hive tables by default for better 
performance. Since Spark 2.4, Spark converts ORC Hive tables by default, too. 
It means Spark uses its own ORC support by default instead of Hive SerDe. As an 
example, `CREATE TABLE t(id int) STORED AS ORC` would be handled with Hive 
SerDe in Spark 2.3, and in Spark 2.4, it would be converted into Spark's ORC 
data source table and ORC vectorization would be applied. To set `false` to 
`spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
 
 ## Upgrading From Spark SQL 2.2 to 2.3
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e3d43499/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
----------------------------------------------------------------------
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index 10c9603..bb134bb 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -105,11 +105,10 @@ private[spark] object HiveUtils extends Logging {
       .createWithDefault(false)
 
   val CONVERT_METASTORE_ORC = buildConf("spark.sql.hive.convertMetastoreOrc")
-    .internal()
     .doc("When set to true, the built-in ORC reader and writer are used to 
process " +
       "ORC tables created by using the HiveQL syntax, instead of Hive serde.")
     .booleanConf
-    .createWithDefault(false)
+    .createWithDefault(true)
 
   val HIVE_METASTORE_SHARED_PREFIXES = 
buildConf("spark.sql.hive.metastore.sharedPrefixes")
     .doc("A comma separated list of class prefixes that should be loaded using 
the classloader " +


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22279][SQL] Enable `convertMetastoreOrc` by default

Reply via email to