[drill-site] branch asf-site updated: edit create schema and refresh table metadata docs

bridgetb Mon, 29 Apr 2019 13:53:34 -0700

This is an automated email from the ASF dual-hosted git repository.

bridgetb pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/drill-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new ca00b83  edit create schema and refresh table metadata docs
ca00b83 is described below

commit ca00b83e5b9c8ad246656cee29dafcd2b44bbdad
Author: Bridget Bevens <[email protected]>
AuthorDate: Mon Apr 29 13:53:10 2019 -0700

    edit create schema and refresh table metadata docs
---
 docs/create-or-replace-schema/index.html |   8 +-
 docs/refresh-table-metadata/index.html   | 415 +++++++++++++++----------------
 feed.xml                                 |   4 +-
 3 files changed, 211 insertions(+), 216 deletions(-)

diff --git a/docs/create-or-replace-schema/index.html 
b/docs/create-or-replace-schema/index.html
index 7ce7f74..b9eddf9 100644
--- a/docs/create-or-replace-schema/index.html
+++ b/docs/create-or-replace-schema/index.html
@@ -1316,13 +1316,13 @@
 
     </div>
 
-     Apr 25, 2019
+     Apr 29, 2019
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>Starting in Drill 1.16, you can define a schema for text files 
using the CREATE OR REPLACE SCHEMA command. Running this command generates a 
hidden .drill.schema file in the table’s root directory. The .drill.schema file 
stores the schema definition in JSON format. Drill uses the schema file at 
runtime if the exec.storage.enable_v3_text_reader and 
store.table.use_schema_file options are enabled. Alternatively, you can create 
the schema file manually. When created manually, the  [...]
+        <p>Starting in Drill 1.16, you can define a schema for text files 
using the CREATE OR REPLACE SCHEMA command. Running this command generates a 
hidden <code>.drill.schema</code> file in the table’s root directory. The 
<code>.drill.schema</code> file stores the schema definition in JSON format. 
Drill uses the schema file at runtime if the 
<code>exec.storage.enable_v3_text_reader</code> and 
<code>store.table.use_schema_file</code> options are enabled. Alternatively, 
you can create t [...]
 
 <h2 id="syntax">Syntax</h2>
 
@@ -1503,7 +1503,7 @@ A property that sets how Drill handles blank column 
values. Accepts the followin
 
 <p>The schema mode determines the ordering of columns returned for wildcard 
(*) queries. The mode is set through the <code>drill.strict</code> property. 
You can set this property to true (strict) or false (not strict). If you do not 
indicate the mode, the default is false (not strict).  </p>
 
-<p><strong>Not Strict (Default)</strong>
+<p><strong>Not Strict (Default)</strong><br>
 Columns defined in the schema are projected in the defined order. Columns not 
defined in the schema are appended to the defined columns, as shown:  </p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">create or replace schema (id int, start_date date format 
&#39;yyyy-MM-dd&#39;) for table dfs.tmp.`text_table` properties 
(&#39;drill.strict&#39; = &#39;false&#39;);
 +------+-----------------------------------------+
@@ -1525,7 +1525,7 @@ select * from dfs.tmp.`text_table`;
 </code></pre></div>
 <p>Note that the “name” column, which was not included in the schema was 
appended to the end of the table.</p>
 
-<p><strong>Strict</strong>
+<p><strong>Strict</strong><br>
 Setting the <code>drill.strict</code> property  to “true” changes the schema 
mode to strict, which means that the reader ignores any columns NOT included in 
the schema. The query only returns the columns defined in the schema, as 
shown:</p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">create or replace schema (id int, start_date date format 
&#39;yyyy-MM-dd&#39;) for table dfs.tmp.`text_table` properties 
(&#39;drill.strict&#39; = &#39;true&#39;);
 +------+-----------------------------------------+
diff --git a/docs/refresh-table-metadata/index.html 
b/docs/refresh-table-metadata/index.html
index 8b17a86..2a8b71f 100644
--- a/docs/refresh-table-metadata/index.html
+++ b/docs/refresh-table-metadata/index.html
@@ -1316,7 +1316,7 @@
 
     </div>
 
-     Apr 23, 2019
+     Apr 29, 2019
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1354,7 +1354,7 @@ Required. The name of the table or directory for which 
Drill will refresh metada
 <h3 id="metadata-storage">Metadata Storage</h3>
 
 <ul>
-<li>Drill traverses directories for Parquet files and gathers the metadata 
from the footer of the files. Drill stores the collected metadata in a metadata 
cache file, <code>.drill.parquet_metadata</code>, at each directory 
level.<br></li>
+<li>Drill traverses directories for Parquet files and gathers the metadata 
from the footer of the files. Drill stores the collected metadata in a metadata 
cache file, <code>.drill.parquet_file_metadata.v4</code>, a summary file, 
<code>.drill.parquet_summary_metadata.v4</code>, and a directories file, 
<code>.drill.parquet_metadata_directories</code> file at each directory 
level.<br></li>
 <li>The metadata cache file stores metadata for files in that directory, as 
well as the metadata for the files in the subdirectories.<br></li>
 <li>For each row group in a Parquet file, the metadata cache file stores the 
column names in the row group and the column statistics, such as the min/max 
values and null count.<br></li>
 <li>If the Parquet data is updated, for example data is added to a file, Drill 
automatically  refreshes the Parquet metadata when you issue the next query 
against the Parquet data.<br></li>
@@ -1404,217 +1404,212 @@ Sets the number of row groups that a table can have. 
You can increase the thresh
 
 <p>Currently, Drill does not support runtime rowgroup pruning. </p>
 
-<!--
-## Examples  
-These examples use a schema, `dfs.samples`, which points to the `/home` 
directory. The `/home` directory contains a subdirectory, `parquet`, which
-contains the `nation.parquet` and a subdirectory, `dir1` with the 
`region.parquet` file. You can access the `nation.parquet` and `region.parquet` 
Parquet files in the `sample-data` directory of your Drill installation.  
-
-    [root@doc23 dir1]# pwd
-    /home/parquet/dir1
-     
-    [root@doc23 parquet]# ls
-    dir1  nation.parquet
-     
-    [root@doc23 dir1]# ls
-    region.parquet  
-
-Change schemas to use `dfs.samples`:
- 
-    use dfs.samples;
-    +-------+------------------------------------------+
-    |  ok   |                 summary                 |
-    +-------+------------------------------------------+
-    | true  | Default schema changed to [dfs.samples]  |
-    +-------+------------------------------------------+  
-
-### Running REFRESH TABLE METADATA on a Directory  
-Running the REFRESH TABLE METADATA command on the `parquet` directory 
generates metadata cache files at each directory level.  
-
-    REFRESH TABLE METADATA parquet;  
-    +-------+---------------------------------------------------+
-    |  ok   |                   summary                     |
-    +-------+---------------------------------------------------+
-    | true  | Successfully updated metadata for table parquet.  |
-    +-------+---------------------------------------------------+  
-
-When looking at the `parquet` directory and `dir1` subdirectory, you can see 
that a metadata cache file was created at each level:
-
-    [root@doc23 parquet]# ls -la
-    drwxr-xr-x   2 root root   95 Mar 18 17:49 dir1
-    -rw-r--r--   1 root root 2642 Mar 18 17:52 .drill.parquet_metadata
-    -rw-r--r--   1 root root   32 Mar 18 17:52 ..drill.parquet_metadata.crc
-    -rwxr-xr-x   1 root root 1210 Mar 13 13:32 nation.parquet
-     
-    [root@doc23 dir1]# ls -la
-    -rw-r--r-- 1 root root 1235 Mar 18 17:52 .drill.parquet_metadata
-    -rw-r--r-- 1 root root   20 Mar 18 17:52 ..drill.parquet_metadata.crc
-    -rwxr-xr-x 1 root root  455 Mar 18 17:41 region.parquet  
-
-The following sections compare the content of the metadata cache file in  the 
`parquet` and `dir1` directories:  
-
-**Content of the metadata cache file in the directory named `parquet` that 
contains the nation.parquet file and subdirectory `dir1`.**  
-
-
-    [root@doc23 parquet]# cat .drill.parquet_metadata
-    {
-      "metadata_version" : "3.3",
-      "columnTypeInfo" : {
-        "`N_COMMENT`" : {
-          "name" : [ "N_COMMENT" ],
-          "primitiveType" : "BINARY",
-          "originalType" : "UTF8",
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        },
-        "`N_NATIONKEY`" : {
-          "name" : [ "N_NATIONKEY" ],
-          "primitiveType" : "INT64",
-          "originalType" : null,
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        },
-        "`R_REGIONKEY`" : {
-          "name" : [ "R_REGIONKEY" ],
-          "primitiveType" : "INT64",
-          "originalType" : null,
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        },
-        "`R_COMMENT`" : {
-          "name" : [ "R_COMMENT" ],
-          "primitiveType" : "BINARY",
-          "originalType" : "UTF8",
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        },
-        "`N_REGIONKEY`" : {
-          "name" : [ "N_REGIONKEY" ],
-          "primitiveType" : "INT64",
-          "originalType" : null,
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        },
-        "`R_NAME`" : {
-          "name" : [ "R_NAME" ],
-          "primitiveType" : "BINARY",
-          "originalType" : "UTF8",
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        },
-        "`N_NAME`" : {
-          "name" : [ "N_NAME" ],
-          "primitiveType" : "BINARY",
-          "originalType" : "UTF8",
-          "precision" : 0,
-          "scale" : 0,
-          "repetitionLevel" : 0,
-          "definitionLevel" : 0
-        }
+<h2 id="examples">Examples</h2>
+
+<p>These examples use a schema, <code>dfs.samples</code>, which points to the 
<code>/tmp</code> directory. The <code>/tmp</code> directory contains the 
following subdirectories and files used in the examples:  </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">[root@doc23 parquet1]# pwd
+/tmp/parquet1
+
+[root@doc23 parquet1]# ls
+Parquet
+
+[root@doc23 parquet1]# cd parquet
+
+[root@doc23 parquet]# ls
+nation.parquet  test
+
+[root@doc23 parquet]# cd test
+
+[root@doc23 test]# ls
+nation.parquet
+</code></pre></div>
+<p><strong>Note:</strong> You can access the sample 
<code>nation.parquet</code> file in the <code>sample-data</code> directory of 
your Drill installation.</p>
+
+<p>Change schemas to switch to <code>dfs.samples</code>: </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">use 
dfs.samples;
++-------+------------------------------------------+
+|  ok   |                 summary                 |
++-------+------------------------------------------+
+| true  | Default schema changed to [dfs.samples]  |
++-------+------------------------------------------+  
+</code></pre></div>
+<h3 id="running-refresh-table-metadata-on-a-directory">Running REFRESH TABLE 
METADATA on a Directory</h3>
+
+<p>Running the REFRESH TABLE METADATA command on the “parquet1” directory 
generates metadata cache files at each directory level.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">apache drill (dfs.samples)&gt; REFRESH TABLE METADATA parquet1;
++------+---------------------------------------------------+
+|  ok  |                      summary                      |
++------+---------------------------------------------------+
+| true | Successfully updated metadata for table parquet1. |
++------+---------------------------------------------------+
+</code></pre></div>
+<p>When looking at the “parquet1” directory and subdirectories, you can see 
that a metadata cache and summary (hidden) files were created at each level:</p>
+
+<p><strong>Note:</strong> The CRC files are Cyclical Redundancy Check checksum 
files used to verify the data integrity of other files. </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">[root@doc23 parquet1]# ls -la
+total 36
+drwxr-xr-x   3 root root  284 Apr 29 11:46 .
+drwxrwxrwt. 51 root root 8192 Apr 29 11:44 ..
+-rw-r--r--   1 root root 1037 Apr 29 11:46 .drill.parquet_file_metadata.v4
+-rw-r--r--   1 root root   20 Apr 29 11:46 ..drill.parquet_file_metadata.v4.crc
+-rw-r--r--   1 root root   51 Apr 29 11:46 .drill.parquet_metadata_directories
+-rw-r--r--   1 root root   12 Apr 29 11:46 
..drill.parquet_metadata_directories.crc
+-rw-r--r--   1 root root 1334 Apr 29 11:46 .drill.parquet_summary_metadata.v4
+-rw-r--r--   1 root root   20 Apr 29 11:46 
..drill.parquet_summary_metadata.v4.crc
+drwxr-xr-x   3 root root  212 Apr 29 11:30 parquet  
+
+[root@doc23 parquet1]# cd parquet
+[root@doc23 parquet]# ls -la
+total 20
+drwxr-xr-x 3 root root  212 Apr 29 11:30 .
+drwxr-xr-x 3 root root  284 Apr 29 11:46 ..
+-rw-r--r-- 1 root root 1021 Apr 29 11:46 .drill.parquet_file_metadata.v4
+-rw-r--r-- 1 root root   16 Apr 29 11:46 ..drill.parquet_file_metadata.v4.crc
+-rw-r--r-- 1 root root 1315 Apr 29 11:46 .drill.parquet_summary_metadata.v4
+-rw-r--r-- 1 root root   20 Apr 29 11:46 
..drill.parquet_summary_metadata.v4.crc
+-rwxr-xr-x 1 root root 1210 Apr 29 11:23 nation.parquet
+drwxr-xr-x 2 root root  200 Apr 29 11:46 test
+
+[root@doc23 test]# ls -la
+total 20
+drwxr-xr-x 2 root root  200 Apr 29 11:46 .
+drwxr-xr-x 3 root root  212 Apr 29 11:30 ..
+-rw-r--r-- 1 root root  517 Apr 29 11:46 .drill.parquet_file_metadata.v4
+-rw-r--r-- 1 root root   16 Apr 29 11:46 ..drill.parquet_file_metadata.v4.crc
+-rw-r--r-- 1 root root 1308 Apr 29 11:46 .drill.parquet_summary_metadata.v4
+-rw-r--r-- 1 root root   20 Apr 29 11:46 
..drill.parquet_summary_metadata.v4.crc
+-rwxr-xr-x 1 root root 1210 Apr 29 11:23 nation.parquet  
+</code></pre></div>
+<p>Looking at the <code>.drill.parquet_file_metadata.v4</code> file in the 
<code>/tmp/parquet1</code> directory, you can see that the file contains the 
paths to the Parquet files in the subdirectories, as well as metadata for those 
files: </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">[root@doc23 parquet1]# cat .drill.parquet_file_metadata.v4
+{
+  &quot;files&quot; : [ {
+    &quot;path&quot; : &quot;parquet/test/nation.parquet&quot;,
+    &quot;length&quot; : 1210,
+    &quot;rowGroups&quot; : [ {
+      &quot;start&quot; : 4,
+      &quot;length&quot; : 944,
+      &quot;rowCount&quot; : 25,
+      &quot;hostAffinity&quot; : {
+        &quot;localhost&quot; : 1.0
       },
-      "files" : [ {
-        "path" : "dir1/region.parquet",
-        "length" : 455,
-        "rowGroups" : [ {
-          "start" : 4,
-          "length" : 250,
-          "rowCount" : 5,
-          "hostAffinity" : {
-            "localhost" : 1.0
-          },
-          "columns" : [ ]
-        } ]
+      &quot;columns&quot; : [ {
+        &quot;name&quot; : [ &quot;N_NATIONKEY&quot; ],
+        &quot;nulls&quot; : -1
+      }, {
+        &quot;name&quot; : [ &quot;N_NAME&quot; ],
+        &quot;nulls&quot; : -1
       }, {
-        "path" : "nation.parquet",
-        "length" : 1210,
-        "rowGroups" : [ {
-          "start" : 4,
-          "length" : 944,
-          "rowCount" : 25,
-          "hostAffinity" : {
-            "localhost" : 1.0
-          },
-          "columns" : [ ]
-        } ]
-      } ],
-      "directories" : [ "dir1" ],
-      "drillVersion" : "1.16.0-SNAPSHOT"  
-
-**Content of the directory named `dir1` that contains the `region.parquet` 
file and no subdirectories.**  
-
-    [root@doc23 dir1]# cat .drill.parquet_metadata
-    {
-      "metadata_version" : "3.3",
-      "columnTypeInfo" : {
-        "`R_REGIONKEY`" : {
-        "name" : [ "R_REGIONKEY" ],
-        "primitiveType" : "INT64",
-        "originalType" : null,
-        "precision" : 0,
-        "scale" : 0,
-        "repetitionLevel" : 0,
-        "definitionLevel" : 0
-        },
-        "`R_COMMENT`" : {
-        "name" : [ "R_COMMENT" ],
-        "primitiveType" : "BINARY",
-        "originalType" : "UTF8",
-        "precision" : 0,
-        "scale" : 0,
-        "repetitionLevel" : 0,
-        "definitionLevel" : 0
-        },
-        "`R_NAME`" : {
-        "name" : [ "R_NAME" ],
-        "primitiveType" : "BINARY",
-          "originalType" : "UTF8",
-        "precision" : 0,
-        "scale" : 0,
-        "repetitionLevel" : 0,
-        "definitionLevel" : 0
-        }
+        &quot;name&quot; : [ &quot;N_REGIONKEY&quot; ],
+        &quot;nulls&quot; : -1
+      }, {
+        &quot;name&quot; : [ &quot;N_COMMENT&quot; ],
+        &quot;nulls&quot; : -1
+      } ]
+    } ]
+  }, {
+    &quot;path&quot; : &quot;parquet/nation.parquet&quot;,
+    &quot;length&quot; : 1210,
+    &quot;rowGroups&quot; : [ {
+      &quot;start&quot; : 4,
+      &quot;length&quot; : 944,
+      &quot;rowCount&quot; : 25,
+      &quot;hostAffinity&quot; : {
+        &quot;localhost&quot; : 1.0
       },
-      "files" : [ {
-        "path" : "region.parquet",
-        "length" : 455,
-        "rowGroups" : [ {
-        "start" : 4,
-        "length" : 250,
-        "rowCount" : 5,
-        "hostAffinity" : {
-        "localhost" : 1.0
-        },
-        "columns" : [ ]
-        } ]
-      } ],
-      "directories" : [ ],
-      "drillVersion" : "1.16.0-SNAPSHOT"
-    }  
-
-### Verifying that the Planner is Using the Metadata Cache File 
-
-When the planner uses metadata cache files, the query plan includes the 
`usedMetadataFile` flag. You can access the query plan in the Drill Web UI, by 
clicking on the query in the Profiles page, or by running the EXPLAIN PLAN FOR 
command, as shown:
-
-    EXPLAIN PLAN FOR SELECT * FROM parquet;  
- 
-    | 00-00    Screen
-    00-01      Project(**=[$0])
-    00-02      Scan(table=[[dfs, samples, parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/home/parquet]], 
selectionRoot=/home/parquet, numFiles=1, numRowGroups=2, usedMetadataFile=true, 
cacheFileRoot=/home/parquet, columns=[`**`]]])
-    |... 
-
---> 
-
+      &quot;columns&quot; : [ {
+        &quot;name&quot; : [ &quot;N_NATIONKEY&quot; ],
+        &quot;nulls&quot; : -1
+      }, {
+        &quot;name&quot; : [ &quot;N_NAME&quot; ],
+        &quot;nulls&quot; : -1
+      }, {
+        &quot;name&quot; : [ &quot;N_REGIONKEY&quot; ],
+        &quot;nulls&quot; : -1
+      }, {
+        &quot;name&quot; : [ &quot;N_COMMENT&quot; ],
+        &quot;nulls&quot; : -1
+      } ]
+    } ]
+  } ]
+</code></pre></div>
+<p>Looking at the <code>.drill.parquet_summary_metadata.v4</code> file in the 
<code>parquet1</code> directory, you can see information about each of the 
columns in the files and the list of subdirectories and interesting columns 
(useful when indicating columns in the REFRESH TABLE METADATA command):  </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">[root@doc23 parquet1]# cat .drill.parquet_summary_metadata.v4
+{
+  &quot;columnTypeInfo&quot; : {
+    &quot;`N_COMMENT`&quot; : {
+      &quot;name&quot; : [ &quot;N_COMMENT&quot; ],
+      &quot;primitiveType&quot; : &quot;BINARY&quot;,
+      &quot;originalType&quot; : &quot;UTF8&quot;,
+      &quot;precision&quot; : 0,
+      &quot;scale&quot; : 0,
+      &quot;repetitionLevel&quot; : 0,
+      &quot;definitionLevel&quot; : 0,
+      &quot;totalNullCount&quot; : -1,
+      &quot;isInteresting&quot; : true
+    },
+    &quot;`N_NATIONKEY`&quot; : {
+      &quot;name&quot; : [ &quot;N_NATIONKEY&quot; ],
+      &quot;primitiveType&quot; : &quot;INT64&quot;,
+      &quot;originalType&quot; : null,
+      &quot;precision&quot; : 0,
+      &quot;scale&quot; : 0,
+      &quot;repetitionLevel&quot; : 0,
+      &quot;definitionLevel&quot; : 0,
+      &quot;totalNullCount&quot; : -1,
+      &quot;isInteresting&quot; : true
+    },
+    &quot;`N_REGIONKEY`&quot; : {
+      &quot;name&quot; : [ &quot;N_REGIONKEY&quot; ],
+      &quot;primitiveType&quot; : &quot;INT64&quot;,
+      &quot;originalType&quot; : null,
+      &quot;precision&quot; : 0,
+      &quot;scale&quot; : 0,
+      &quot;repetitionLevel&quot; : 0,
+      &quot;definitionLevel&quot; : 0,
+      &quot;totalNullCount&quot; : -1,
+      &quot;isInteresting&quot; : true
+    },
+    &quot;`N_NAME`&quot; : {
+      &quot;name&quot; : [ &quot;N_NAME&quot; ],
+      &quot;primitiveType&quot; : &quot;BINARY&quot;,
+      &quot;originalType&quot; : &quot;UTF8&quot;,
+      &quot;precision&quot; : 0,
+      &quot;scale&quot; : 0,
+      &quot;repetitionLevel&quot; : 0,
+      &quot;definitionLevel&quot; : 0,
+      &quot;totalNullCount&quot; : -1,
+      &quot;isInteresting&quot; : true
+    }
+  },
+  &quot;directories&quot; : [ &quot;parquet/test&quot;, &quot;parquet&quot; ],
+  &quot;drillVersion&quot; : &quot;1.16.0-SNAPSHOT&quot;,
+  &quot;totalRowCount&quot; : 50,
+  &quot;allColumnsInteresting&quot; : true,
+  &quot;metadata_version&quot; : &quot;4&quot;  
+</code></pre></div>
+<h3 
id="verifying-that-the-planner-is-using-the-metadata-cache-or-summary-files">Verifying
 that the Planner is Using the Metadata Cache or Summary Files</h3>
+
+<p>When the planner uses metadata cache files, the query plan includes the 
<code>usedMetadataFile</code> flag. You can access the query plan in the Drill 
Web UI, by clicking on the query in the Profiles page, or by running the 
EXPLAIN PLAN FOR command, as shown:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">apache drill (dfs.samples)&gt; explain plan for select * from 
parquet1;
++----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
+|                                       text                                   
    |                                       json                                
       |
++----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
+| 00-00    Screen
+00-01      Project(**=[$0])
+00-02        Scan(table=[[dfs, samples, parquet1]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/parquet1]], 
selectionRoot=/tmp/parquet1, numFiles=1, numRowGroups=2, usedMetadataFile=true, 
cacheFileRoot=/tmp/parquet1, columns=[`**`]]])  
+ |   
+</code></pre></div>
+<p>When you run the EXPLAIN command with a COUNT() query, as shown, you can 
see that the query planner uses the summary cache file and avoids reading the 
larger metadata cache file. The query plan includes the 
<code>usedMetadataSummaryFile</code> flag.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">apache drill (dfs.samples)&gt; explain plan for select 
count(*) from parquet1;
++----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
+|                                       text                                   
    |                                       json                                
       |
++----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
+| 00-00    Screen
+00-01      Project(EXPR$0=[$0])
+00-02        DirectScan(groupscan=[files = 
[file:/tmp/parquet1/.drill.parquet_summary_metadata.v4], numFiles = 1, 
usedMetadataSummaryFile = true, DynamicPojoRecordReader{records = [[50]]}])
+ | 
+</code></pre></div>
     
       
         <div class="doc-nav">
diff --git a/feed.xml b/feed.xml
index aeb26d3..729a31a 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Fri, 26 Apr 2019 12:50:50 -0700</pubDate>
-    <lastBuildDate>Fri, 26 Apr 2019 12:50:50 -0700</lastBuildDate>
+    <pubDate>Mon, 29 Apr 2019 13:50:47 -0700</pubDate>
+    <lastBuildDate>Mon, 29 Apr 2019 13:50:47 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>

[drill-site] branch asf-site updated: edit create schema and refresh table metadata docs

Reply via email to