spark git commit: [SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown

marmbrus Mon, 01 Dec 2014 13:11:01 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 35bc338c0 -> 9c9b4bd1e



[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown

Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by 
default and when it's safe to be turned on.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png"; height=40 alt="Review on 
Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the 
following commits:

2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown

(cherry picked from commit 5db8dcaf494e0dffed4fc22f19b0334d95ab6bfb)
Signed-off-by: Michael Armbrust <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9c9b4bd1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9c9b4bd1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9c9b4bd1

Branch: refs/heads/branch-1.2
Commit: 9c9b4bd1e4ac40c4abf4b5d1113c3056732e2c25
Parents: 35bc338
Author: Cheng Lian <[email protected]>
Authored: Mon Dec 1 13:09:51 2014 -0800
Committer: Michael Armbrust <[email protected]>
Committed: Mon Dec 1 13:10:20 2014 -0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9c9b4bd1/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 24a68bb..96a3209 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -146,7 +146,7 @@ describes the various methods for loading data into a 
SchemaRDD.
 
 Spark SQL supports two different methods for converting existing RDDs into 
SchemaRDDs.  The first
 method uses reflection to infer the schema of an RDD that contains specific 
types of objects.  This
-reflection based approach leads to more concise code and works well when you 
already know the schema 
+reflection based approach leads to more concise code and works well when you 
already know the schema
 while writing your Spark application.
 
 The second method for creating SchemaRDDs is through a programmatic interface 
that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
 
 ### Configuration
 
-Configuration of Parquet can be done using the `setConf` method on SQLContext 
or by running 
+Configuration of Parquet can be done using the `setConf` method on SQLContext 
or by running
 `SET key=value` commands using SQL.
 
 <table class="table">
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` 
method on SQLContext or
   <td><code>spark.sql.parquet.binaryAsString</code></td>
   <td>false</td>
   <td>
-    Some other Parquet-producing systems, in particular Impala and older 
versions of Spark SQL, do 
-    not differentiate between binary data and strings when writing out the 
Parquet schema.  This 
+    Some other Parquet-producing systems, in particular Impala and older 
versions of Spark SQL, do
+    not differentiate between binary data and strings when writing out the 
Parquet schema.  This
     flag tells Spark SQL to interpret binary data as a string to provide 
compatibility with these systems.
   </td>
 </tr>
@@ -591,11 +591,21 @@ Configuration of Parquet can be done using the `setConf` 
method on SQLContext or
   <td><code>spark.sql.parquet.compression.codec</code></td>
   <td>gzip</td>
   <td>
-    Sets the compression codec use when writing Parquet files. Acceptable 
values include: 
+    Sets the compression codec use when writing Parquet files. Acceptable 
values include:
     uncompressed, snappy, gzip, lzo.
   </td>
 </tr>
 <tr>
+  <td><code>spark.sql.parquet.filterPushdown</code></td>
+  <td>false</td>
+  <td>
+    Turn on Parquet filter pushdown optimization. This feature is turned off 
by default because of a known
+    bug in Paruet 1.6.0rc3 (<a 
href="https://issues.apache.org/jira/browse/PARQUET-136";>PARQUET-136</a>).
+    However, if your table doesn't contain any nullable string or binary 
columns, it's still safe to turn
+    this feature on.
+  </td>
+</tr>
+<tr>
   <td><code>spark.sql.hive.convertMetastoreParquet</code></td>
   <td>true</td>
   <td>
@@ -945,7 +955,7 @@ options.
 
 ## Migration Guide for Shark User
 
-### Scheduling 
+### Scheduling
 To set a [Fair Scheduler](job-scheduling.html#fair-scheduler-pools) pool for a 
JDBC client session,
 users can set the `spark.sql.thriftserver.scheduler.pool` variable:
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown

Reply via email to