Repository: drill
Updated Branches:
  refs/heads/gh-pages 49e50be68 -> 3272b2e74


doc update for DRILL-3867


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/3272b2e7
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/3272b2e7
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/3272b2e7

Branch: refs/heads/gh-pages
Commit: 3272b2e74abb9e685518925d3f61ef4d06b84172
Parents: 49e50be
Author: Bridget Bevens <bbev...@maprtech.com>
Authored: Thu Aug 10 15:24:35 2017 -0700
Committer: Bridget Bevens <bbev...@maprtech.com>
Committed: Thu Aug 10 15:24:35 2017 -0700

----------------------------------------------------------------------
 .../025-optimizing-parquet-reading.md           | 41 ++++++++++++--------
 1 file changed, 24 insertions(+), 17 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/3272b2e7/_docs/performance-tuning/025-optimizing-parquet-reading.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/025-optimizing-parquet-reading.md 
b/_docs/performance-tuning/025-optimizing-parquet-reading.md
index f2d5141..65884af 100644
--- a/_docs/performance-tuning/025-optimizing-parquet-reading.md
+++ b/_docs/performance-tuning/025-optimizing-parquet-reading.md
@@ -1,41 +1,48 @@
 ---
 title: "Optimizing Parquet Metadata Reading"
-date: 2016-02-08 21:57:13 UTC
+date: 2017-08-10 22:24:37 UTC
 parent: "Performance Tuning"
 ---
 
-Parquet metadata caching is an optional feature in Drill 1.2 and later. When 
you use this feature, Drill generates a metadata cache file. Drill stores the 
metadata cache file in a directory you specify and its subdirectories. When you 
run a query on this directory or a subdirectory, Drill reads a single metadata 
cache file instead of retrieving metadata from multiple Parquet files during 
the query-planning phase.
+Parquet metadata caching is a feature that enables Drill to read a single 
metadata cache file instead of retrieving metadata from multiple Parquet files 
during the query-planning phase. 
+Parquet metadata caching is available for Parquet data in Drill 1.2 and later. 
To enable Parquet metadata caching, issue the REFRESH TABLE METADATA <path to 
table> command. When you run this command Drill generates a metadata cache 
file.  
 
-Parquet metadata caching is useful only with Parquet data, and does not 
benefit queries on Hive tables, HBase tables, or text files. 
+{% include startnote.html %}Parquet metadata caching does not benefit queries 
on Hive tables, HBase tables, or text files.{% include endnote.html %}  
+
+Drill stores the metadata cache file in the specified directory and 
subdirectories. When you run a query on this directory or subdirectories, Drill 
reads the metadata cache file instead of retrieving metadata from multiple 
Parquet files during the query-planning phase.     
+
+In Drill 1.11 and later, Drill stores the paths to the Parquet files as 
relative paths instead of absolute paths. You can move partitioned Parquet 
directories from one location in the distributed files system to another 
without issuing the REFRESH TABLE METADATA command to rebuild the Parquet 
metadata files; the metadata remains valid in the new location.   
+
+{% include startnote.html %}Reverting back to a previous version of Drill from 
1.11 is not recommended because Drill will incorrectly interpret the Parquet 
metadata files created by Drill 1.11. Should this occur, remove the Parquet 
metadata files and run the refresh table metadata command to rebuild the files 
in the older format.{% include endnote.html %} 
+ 
 
 ## When to Use Parquet Metadata Caching
 
-The scenarios in which metadata caching is useful is when the planning time is 
a significant percentage of the total elapsed time of the query. If the query 
execution time is the dominant factor, which is typically observed with a large 
number of files, then metadata caching will have very little impact. To 
determine that query execution time is the dominant factor, run an EXPLAIN plan 
on your query of a large number of files, and compare its time to the total 
time of query execution. Use the comparison to determine whether metadata 
caching will be useful.
+Metadata caching is useful when planning time is a significant percentage of 
the total elapsed time of the query. If the query execution time is the 
dominant factor, which is typically observed with a large number of files, then 
metadata caching will have very little impact. To determine that query 
execution time is the dominant factor, run an EXPLAIN plan on your query of a 
large number of files, and compare its time to the total time of query 
execution. Use the comparison to determine whether metadata caching will be 
useful.
 
 When enabled, Drill always uses the Parquet metadata cache during the 
query-planning phase. To optimize reading Parquet metadata, make sure the 
metadata cache is up-to-date after making any changes, such as inserts, to the 
data in the cluster. The next section describes how to update the metadata 
cache.
 
 
-## How to Trigger Generation of the Parquet Metadata Cache File
+## Generating the Parquet Metadata Cache File
 
 The following command generates the Parquet metadata cache file in the `<path 
to table>` and its subdirectories.
 
-`REFRESH TABLE METADATA <path to table>`
+       REFRESH TABLE METADATA <path to table>
 
 You need to run this command on a directory, nested or flat, only once during 
the session. Only the first query gathers the metadata unless the Parquet data 
changes, for example, you delete some data. If you did not make changes to the 
Parquet data, subsequent queries encounter the up-to-date Parquet metadata 
files. There is no need for Drill to regenerate the metadata. If there are 
changes, the metadata needs updating, so Drill dynamically regenerates the 
Parquet metadata when you issue the next query.
 
-The elapsed time of the first query that triggers regeneration of metadata can 
be greater than that of subsequent queries that use that metadata. If this 
increase in the time of the first query is unacceptable, make sure the cache is 
up-to-date by running the REFRESH TABLE METADATA command.
+The elapsed time of the first query that triggers regeneration of metadata can 
be greater than that of subsequent queries that use that metadata. If this 
increase in the time of the first query is unacceptable, make sure the cache is 
up-to-date by running the REFRESH TABLE METADATA command, as shown in the 
following example:
+
 
-## Example of Generating Parquet Metadata
+       0: jdbc:drill:schema=dfs> REFRESH TABLE METADATA t1;
+       +-------+----------------------------------------------+
+       |  ok   |                   summary                    |
+       +-------+----------------------------------------------+
+       | true  | Successfully updated metadata for table t1.  |
+       +-------+----------------------------------------------+
+       1 row selected (0.445 seconds)  
+  
 
-```
-0: jdbc:drill:schema=dfs> REFRESH TABLE METADATA t1;
-+-------+----------------------------------------------+
-|  ok   |                   summary                    |
-+-------+----------------------------------------------+
-| true  | Successfully updated metadata for table t1.  |
-+-------+----------------------------------------------+
-1 row selected (0.445 seconds)
-```
 
 ## How Drill Generates and Uses Parquet Metadata
 

Reply via email to