Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20484#discussion_r166763629 --- Diff: docs/sql-programming-guide.md --- @@ -1776,6 +1776,44 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.2 to 2.3 + - Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files and Hive ORC tables. To do that, the following configurations are newly added or change their default values. + + - New configurations + + <table class="table"> + <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> + <tr> + <td><code>spark.sql.orc.impl</code></td> + <td><code>native</code></td> + <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td> + </tr> + <tr> + <td><code>spark.sql.orc.enableVectorizedReader</code></td> + <td><code>true</code></td> + <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td> + </tr> + </table> + + - Changed configurations + + <table class="table"> + <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> + <tr> + <td><code>spark.sql.orc.filterPushdown</code></td> + <td><code>true</code></td> + <td>Enables filter pushdown for ORC files. It is <code>false</code> by default prior to Spark 2.3.</td> + </tr> + <tr> + <td><code>spark.sql.hive.convertMetastoreOrc</code></td> + <td><code>true</code></td> + <td>Enable the Spark's ORC support, which can be configured by <code>spark.sql.orc.impl</code>, instead of Hive SerDe when reading from and writing to Hive ORC tables. It is <code>false</code> by default prior to Spark 2.3.</td> + </tr> + </table> + + - Since Apache ORC 1.4.1 is a standalone library providing a subset of Hive ORC related configurations, see <a href="https://orc.apache.org/docs/hive-config.html">Hive Configuration</a> of Apache ORC project for a full list of supported ORC configurations. --- End diff -- For user configurations, both `spark.hadoop.` or `hive-site.xml` works.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org