[drill-site] branch asf-site updated: updates

bridgetb Wed, 01 May 2019 17:24:15 -0700

This is an automated email from the ASF dual-hosted git repository.

bridgetb pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/drill-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 73886f6  updates
73886f6 is described below

commit 73886f61995927b0e06e3af9030a3161f58c084f
Author: Bridget Bevens <[email protected]>
AuthorDate: Wed May 1 17:23:41 2019 -0700

    updates
---
 docs/analyze-table/index.html               | 12 +++--
 docs/configuring-the-drill-shell/index.html | 19 ++++++-
 docs/create-or-replace-schema/index.html    | 82 ++++++++++++++++++++---------
 feed.xml                                    |  4 +-
 4 files changed, 85 insertions(+), 32 deletions(-)

diff --git a/docs/analyze-table/index.html b/docs/analyze-table/index.html
index bf439a7..94d0c7c 100644
--- a/docs/analyze-table/index.html
+++ b/docs/analyze-table/index.html
@@ -1316,7 +1316,7 @@
 
     </div>
 
-     Apr 30, 2019
+     May 2, 2019
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1378,9 +1378,9 @@ An integer that specifies the percentage of data on which 
to compute statistics.
 <li>Rowcount (total number of entries in the table)<br></li>
 <li>Nonnullrowcount (total number of non-null entries in the table)<br></li>
 <li>NDV (total distinct values in the table)<br></li>
-<li>Avgwidth (average width of a column/average number of characters in a 
column)<br></li>
-<li>Majortype (data type of the column values)<br></li>
-<li>Histogram (represents the frequency distribution of values (numeric data) 
in a column) See Histograms.<br></li>
+<li>Avgwidth (average width, in bytes, of a column)<br></li>
+<li>Majortype (data type and data mode (OPTIONAL, REQUIRED, REPEATED) of the 
column values)<br></li>
+<li>Histogram (represents the frequency distribution of values (numeric data) 
in a column) See <a 
href="/docs/analyze-table/#histograms">Histograms</a>.<br></li>
 <li><p>When you look at the statistics file, statistics for each column 
display in the following format (c_nationkey is used as an example column):  
</p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">{&quot;column&quot;:&quot;`c_nationkey`&quot;,&quot;majortype&quot;:{&quot;type&quot;:&quot;INT&quot;,&quot;mode&quot;:&quot;REQUIRED&quot;},&quot;schema&quot;:1.0,&quot;rowcount&quot;:1500.0,&quot;nonnullrowcount&quot;:1500.0,&quot;ndv&quot;:25,&quot;avgwidth&quot;:4.0,&quot;histogram&quot;:{&quot;category&quot;:&quot;numeric-equi-depth&quot;,&quot;numRowsPerBucket&quot;:150,&quot;buckets&quot;:[0.0,2.0,4.0,7.0,9.0
 [...]
 </code></pre></div></li>
@@ -1443,6 +1443,8 @@ Sample
 
 <h2 id="histograms">Histograms</h2>
 
+<p><strong>Note:</strong> Currently, histograms are supported for numeric 
columns only.  </p>
+
 <p>Histograms show the distribution of data to determine if data is skewed or 
normally distributed. Histogram statistics improve the selectivity estimates 
used by the optimizer to create the most efficient query plans possible. 
Histogram statistics are useful for range predicates to help determine how many 
rows belong to a particular range.   </p>
 
 <p>Running the ANALYZE TABLE statement generates equi-depth histogram 
statistics on each column in a table. Equi-depth histograms distribute distinct 
column values across buckets of varying widths, with all buckets having 
approximately the same number of rows. The fixed number of rows per bucket is 
predetermined by <code>ceil(number_rows/n)</code>, where <code>n</code> is the 
number of buckets. The number of distinct values in each bucket depends on the 
distribution of the values in a co [...]
@@ -1465,7 +1467,7 @@ Histogram statistics are generated for each column, as 
shown:  </p>
 
 
<p>qhistogram&quot;:{&quot;category&quot;:&quot;numeric-equi-depth&quot;,&quot;numRowsPerBucket&quot;:150,&quot;buckets&quot;:[0.0,2.0,4.0,7.0,9.0,12.0,15.199999999999978,17.0,19.0,22.0,24.0]</p>
 
-<p>In this example, there are 11 buckets. Each bucket contains 150 rows, which 
is an approximation of the number of rows (1500)/number of buckets (11). The 
list of numbers for the “buckets” property indicates boundaries for each 
bucket. Starting from 0, the first number (0.0) denotes the end of the first 
bucket and the start point of the second bucket. The second number (2.0) 
denotes end point for the second bucket and starting point for the third 
bucket, and so on.  </p>
+<p>In this example, there are 10 buckets. Each bucket contains 150 rows, which 
is calculated as the number of rows (1500)/number of buckets (10). The list of 
numbers for the “buckets” property indicates bucket boundaries, with the first 
bucket starting at 0.0 and ending at 2.0. The end of the first bucket is the 
start point for the second bucket, such that the second bucket starts at 2.0 
and ends at 4.0, and so on for the remainder of the buckets. </p>
 
 <h2 id="limitations">Limitations</h2>
 
diff --git a/docs/configuring-the-drill-shell/index.html 
b/docs/configuring-the-drill-shell/index.html
index 83ee6fb..af7036b 100644
--- a/docs/configuring-the-drill-shell/index.html
+++ b/docs/configuring-the-drill-shell/index.html
@@ -1314,7 +1314,7 @@
 
     </div>
 
-     Dec 30, 2018
+     May 2, 2019
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1322,6 +1322,23 @@
       
         <p>Drill uses SQLLine as the Drill shell. SQLLine is a pure-Java 
console-based utility for connecting to relational databases and running SQL 
commands. </p>
 
+<p>Starting in Drill 1.16, Drill uses SQLLine 1.7. This upgrade changes the 
default Drill prompt to <code>apache drill&gt;</code>. If you switch to a 
specific schema, for example <code>dfs.tmp</code>, the prompt includes the 
current schema, as shown:<br>
+    use dfs.tmp;
+    +------+-------------------------------------+
+    |  ok  |               summary               |
+    +------+-------------------------------------+
+    | true | Default schema changed to [dfs.tmp] |
+    +------+-------------------------------------+</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">apache drill (dfs.tmp)&gt;
+</code></pre></div>
+<p>To return to the previous prompt display, modify 
<code>drill-sqlline-override.conf</code>, and set 
<code>drill.sqlline.prompt.with_schema</code> to false.</p>
+
+<p>Alternatively, you can define a custom prompt using the command <code>!set 
prompt &lt;new-prompt&gt;</code>, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">apache drill (dfs.tmp)&gt; !set prompt good-drill
+good-drill   
+</code></pre></div>
+<p>You can use <code>prompt</code>, <code>promptScript</code>, or 
<code>rightPrompt</code> with the <code>!set</code> command. These properties 
can be overridden in <code>drill-sqlline-override.conf</code>.</p>
+
 <p>Starting in Drill 1.15, Drill uses SQLLine 1.6, which you can customize 
through the Drill <a 
href="/docs/configuring-the-drill-shell/#customizing-sqlline-in-the-drill-sqlline-override-conf-file">configuration
 file, drill-sqlline-override.conf</a>. Before installing and running Drill 
with SQLLine 1.6, delete the old SQLLine history file located in:  </p>
 
 <ul>
diff --git a/docs/create-or-replace-schema/index.html 
b/docs/create-or-replace-schema/index.html
index b9eddf9..11ffc01 100644
--- a/docs/create-or-replace-schema/index.html
+++ b/docs/create-or-replace-schema/index.html
@@ -1316,43 +1316,50 @@
 
     </div>
 
-     Apr 29, 2019
+     May 2, 2019
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>Starting in Drill 1.16, you can define a schema for text files 
using the CREATE OR REPLACE SCHEMA command. Running this command generates a 
hidden <code>.drill.schema</code> file in the table’s root directory. The 
<code>.drill.schema</code> file stores the schema definition in JSON format. 
Drill uses the schema file at runtime if the 
<code>exec.storage.enable_v3_text_reader</code> and 
<code>store.table.use_schema_file</code> options are enabled. Alternatively, 
you can create t [...]
+        <p>Starting in Drill 1.16, you can define a schema for text files 
using the CREATE OR REPLACE SCHEMA command. Schema is only available for tables 
represented by a directory. To use this feature with a single file, put the 
file inside a directory, and use the directory name to query the table.</p>
+
+<p>In Drill 1.16, this feature is in preview status and disabled by default. 
You can enable this feature by setting the 
<code>exec.storage.enable_v3_text_reader</code> and 
<code>store.table.use_schema_file</code> system/session options to true. The 
feature is currently only available for text (CSV) files.</p>
+
+<p>Running this command generates a hidden <code>.drill.schema</code> file in 
the table’s root directory. The <code>.drill.schema</code> file stores the 
schema definition in JSON format. Alternatively, you can create the schema file 
manually. If created manually, the file content must comply with the structure 
recognized by the Drill.  </p>
+
+<p>The end of this topic provides <a 
href="/docs/create-or-replace-schema/#examples">examples</a> that show how the 
feature is used. You may want to review this section before reading the 
reference material.  </p>
+
+<p>Please post your experience and suggestions to the &quot;<a 
href="[email protected]">user</a>&quot; mailing list.</p>
 
 <h2 id="syntax">Syntax</h2>
 
 <p>The CREATE OR REPLACE SCHEMA command supports the following syntax:</p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">CREATE [OR REPLACE] SCHEMA
 [LOAD &#39;file:///path/to/file&#39;]
-[(column_name data_type nullability format default properties 
{prop=&#39;val&#39;, ...})]
-[FOR TABLE `table_name`]
-[PATH &#39;file:///schema_file_path/schema_file_name&#39;] 
-[PROPERTIES (&#39;key1&#39;=&#39;value1&#39;, &#39;key2&#39;=&#39;value2&#39;, 
...)]  
+[(column_name data_type [nullability] [format] [default] [properties 
{prop=&#39;val&#39;, ...})]]
+[FOR TABLE `table_name` | PATH 
&#39;file:///schema_file_path/schema_file_name&#39;] 
+[PROPERTIES (&#39;key1&#39;=&#39;value1&#39;, &#39;key2&#39;=&#39;value2&#39;, 
...)]   
 </code></pre></div>
 <h2 id="parameters">Parameters</h2>
 
 <p><em>OR REPLACE</em><br>
-Existing schema is dropped and replaced with the new schema. Only supported 
when using FOR TABLE. Not supported when using PATH because it prevents 
malicious deletion of any file. You must manually delete any schema file 
created in a custom location. </p>
+Existing schema is dropped and replaced with the new schema. Prevents 
malicious deletion of any file. Only supported when using FOR TABLE. Not 
supported when using PATH. Instead, you must manually delete any schema file 
created in a custom PATH location.  </p>
 
 <p><em>LOAD</em><br>
 Loads raw schema (list of column names with their attributes) from a file. You 
must indicate the path to the file after the LOAD keyword. Note that columns 
should be listed or provided when using the LOAD clause; at least one option is 
required for the successful schema creation.</p>
 
 <p><em>column_name</em><br>
-Name of the column for which schema is created. Case-insensitive. </p>
+Name of the column for which schema is created. Case-insensitive. The name 
must match the name in the data file or data source. You cannot rename columns 
using the schema mechanism. </p>
 
 <p><em>data_type</em><br>
 Data type defined for the column. See Supported Data Types. </p>
 
 <p><em>format</em><br>
-Sets the format for date and time data types when converting from string.</p>
+Sets the format for date and time data types when converting from string. See 
<a 
href="/docs/create-or-replace-schema/#format-for-date-time-conversion">Format 
for Date, Time Conversion</a>.</p>
 
 <p><em>default</em><br>
-Sets a default value for non-nullable columns, such that queries return the 
default value instead of null. </p>
+Used for non-nullable columns. The default value is returned by queries when 
the column is missing from a data file. The default value is a string enclosed 
in single quotes, like &#39;10&#39;. If you provide a format, the value must be 
valid for that format. </p>
 
 <p><em>properties</em><br>
 Keyword to include optional properties. See Related Options below.  </p>
@@ -1374,7 +1381,15 @@ List of properties as key-value pairs in  parenthesis.  
</p>
 
 <h2 id="related-options">Related Options</h2>
 
-<p>You must enable the following options for Drill to use the schema created 
during query execution:</p>
+<p>In Drill 1.16, you must enable the following options for Drill to use the 
schema created during query execution: </p>
+
+<p><strong>exec.storage.enable_v3_text_reader</strong><br>
+Enables the preview &quot;version 3&quot; of the text (CSV) file reader. The 
V3 text reader is the only reader in Drill 1.16 that supports file schemas.  
</p>
+
+<p><strong>store.table.use_schema_file</strong><br>
+Enables the use of the schema file mechanism.</p>
+
+<p>You can enable these options, as shown:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">set 
`exec.storage.enable_v3_text_reader` = true;
 +------+---------------------------------------------+
 |  ok  |                   summary                   |
@@ -1391,25 +1406,41 @@ set `store.table.use_schema_file` = true;
 </code></pre></div>
 <h2 id="related-properties">Related Properties</h2>
 
-<p>When you create a schema, you can set the following properties within the 
CREATE [OR REPLACE] SCHEMA command:   </p>
+<p>Drill normally uses &quot;schema on read&quot; to load data from your 
tables. The schema mechanism allows you to perform some data cleanup and 
transformations on your data. You can:  </p>
+
+<ul>
+<li>Identify which columns to load, ignoring columns that are not needed by 
users.<br></li>
+<li>Handle schema evolution by providing names and default values for columns 
that may be missing from some of your older (or newer) data files.<br></li>
+<li>Convert text fields to numbers, dates, times or timestamps without having 
to add a CAST to every query (or define a view.)<br></li>
+</ul>
+
+<p>When you create a schema, you can set the following column properties 
within the CREATE [OR REPLACE] SCHEMA command: </p>
+
+<p>(<code>drill.format, drill.default, drill.blank-as</code>)  </p>
+
+<p>You can also set the following table properties:</p>
+
+<p>(<code>drill.strict</code>)  </p>
+
+<p>The following sections describe the properties that aid in data clean up 
and transformation:   </p>
 
 <p><strong>drill.strict</strong><br>
-A table property that determines the ordering of columns returned for wildcard 
(*) queries. Accepts a value of true or false. See Schema Mode (Column Order). 
</p>
+Table property that determines the set of columns returned for a wildcard (*) 
query. With <code>drill.strict=false</code> (the default), a wildcard includes 
all columns in the table, whether or not they are listed in the schema. With 
<code>drill.strict=true</code>, a wildcard includes only the columns defined in 
the schema, and in the order defined by the schema. See <a 
href="/docs/create-or-replace-schema/#schema-mode-column-order">Schema Mode 
(Column Order)</a>. </p>
 
 <p><strong>drill.format</strong><br>
-A column property that ensures proper conversion when converting string values 
to date and time data types. See Format for Date, Time Conversion.</p>
+Same as the format parameter. (The format parameter is stored as a property in 
the schema file.) See the format parameter for the supported formats. Also, see 
<a 
href="/docs/create-or-replace-schema/#format-for-date-time-conversion">Format 
for Date, Time Conversion</a>.</p>
 
 <p><strong>drill.default</strong><br>
-A column property that sets non-nullable columns to a “default” value when 
creating the schema. See Column Modes (Nullable and Non-Nullable Columns).  </p>
+Same as the default parameter. (The default parameter is stored as a property 
in the schema file.) See <a 
href="/docs/create-or-replace-schema/#column-modes-nullable-and-non-nullable-columns">Column
 Modes (Nullable and Non-Nullable Columns</a>).  </p>
 
 <p><strong>drill.blank-as</strong><br>
 A property that sets how Drill handles blank column values. Accepts the 
following values:<br>
 - <strong>null</strong>: If the column is nullable, treat the blank as null. 
If non-nullable, leave the blank unchanged.<br>
 - <strong>0</strong>: Replace blanks with the value &quot;0&quot; for numeric 
types.<br>
-- <strong>skip</strong>: Skip blank values. This sets the column to its 
default value: NULL for nullable columns, the default value for non-nullable 
columns.<br>
-- If left empty, blanks have no special meaning. A blank is parsed as any 
other string, which typically produces an error.  </p>
+- <strong>skip</strong>: Skip blank values. This sets the column to its 
default value: NULL for nullable columns, or the default value for non-nullable 
columns.<br>
+- If left empty, blanks have no special meaning. A blank is parsed as any 
other string, which typically produces an error for text columns converted to a 
numeric type.   </p>
 
-<p>See Handling Policy for Blank Column Values.  </p>
+<p>See <a 
href="/docs/create-or-replace-schema/#handling-policy-for-blank-column-values">Handling
 Policy for Blank Column Values</a>.  </p>
 
 <h3 id="setting-properties">Setting Properties</h3>
 
@@ -1463,10 +1494,10 @@ A property that sets how Drill handles blank column 
values. Accepts the followin
 <h2 id="related-commands">Related Commands</h2>
 <div class="highlight"><pre><code class="language-text" data-lang="text">DROP 
SCHEMA [IF EXISTS] FOR TABLE `table_name`
 </code></pre></div>
-<p>See Dropping Schema for a Table in the Examples section at the end of this 
topic. </p>
+<p>See <a 
href="/docs/create-or-replace-schema/#dropping-schema-for-a-table">Dropping 
Schema for a Table</a>. </p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">DESCRIBE SCHEMA FOR TABLE `table_name`
 </code></pre></div>
-<p>See Describing Schema for a Table in the Examples section at the end of 
this topic.   </p>
+<p>See <a 
href="/docs/create-or-replace-schema/#describing-schema-for-a-table">Describing 
Schema for a Table</a>.   </p>
 
 <h2 id="supported-data-types">Supported Data Types</h2>
 
@@ -1495,13 +1526,14 @@ A property that sets how Drill handles blank column 
values. Accepts the followin
 <li>Schema provisioning only works with tables defined as directories because 
Drill must have a place to store the schema file. The directory can contain one 
or more files.<br></li>
 <li>Text files must have headers. The default extension for delimited text 
files with headers is <code>.csvh</code>. Note that the column names that 
appear in the headers match column definitions in the schema.<br></li>
 <li>You do not have to enumerate all columns in a file when creating a schema. 
You can indicate the columns of interest only.<br></li>
-<li>Columns in the defined schema do not have to be in the same order as in 
the data file. However, the names must match. The case can differ, for example 
“name” and “NAME” are acceptable.<br></li>
+<li>Columns in the defined schema do not have to be in the same order as in 
the data file.<br></li>
+<li>Column names must match. The case can differ, for example “name” and 
“NAME” are acceptable.<br></li>
 <li>Queries on columns with data types that cannot be converted fail with a 
<code>DATA_READ_ERROR</code>.<br></li>
 </ul>
 
 <h3 id="schema-mode-column-order">Schema Mode (Column Order)</h3>
 
-<p>The schema mode determines the ordering of columns returned for wildcard 
(*) queries. The mode is set through the <code>drill.strict</code> property. 
You can set this property to true (strict) or false (not strict). If you do not 
indicate the mode, the default is false (not strict).  </p>
+<p>The schema mode determines the set of columns returned for wildcard (*) 
queries and the  ordering of those columns. The mode is set through the 
<code>drill.strict</code> property. You can set this property to true (strict) 
or false (not strict). If you do not indicate the mode, the default is false 
(not strict).  </p>
 
 <p><strong>Not Strict (Default)</strong><br>
 Columns defined in the schema are projected in the defined order. Columns not 
defined in the schema are appended to the defined columns, as shown:  </p>
@@ -1549,6 +1581,8 @@ select * from dfs.tmp.`text_table`;
 
 <h2 id="including-additional-columns-in-the-schema">Including Additional 
Columns in the Schema</h2>
 
+<p>The ability to include additional columns in the schema enables schema 
evolution, which is useful when some columns appear only in newer (or older) 
files. </p>
+
 <p>When you create a schema, you can include columns that do not exist in the 
table and these columns will be projected. This feature ensures that queries 
return the correct results whether the files have a specific column or not. 
Note that schema mode does not affect the behavior of this feature.</p>
 
 <p>For example, the “comment” column is not in the text_table, but added when 
creating the schema:  </p>
@@ -1569,7 +1603,7 @@ select * from dfs.tmp.`text_table`;
 | 3    | 2016-01-01 |  null   | Pebbles |
 | 4    | null       |  null   | Barney  |
 | null | null       |  null   | Dino    |
-+------+------------+---------+---------+  
++------+------------+---------+---------+    
 </code></pre></div>
 <h2 id="column-modes-nullable-and-non-nullable-columns">Column Modes (Nullable 
and Non-Nullable Columns)</h2>
 
@@ -1752,7 +1786,7 @@ select * from dfs.tmp.`text_blank`;
 
 <h2 id="limitations">Limitations</h2>
 
-<p>None</p>
+<p>This feature is currently in the alpha phase (preview, experimental) for 
Drill 1.16 and only applies to text (CSV) files in this release. You must 
enable this feature through the <code>exec.storage.enable_v3_text_reader</code> 
and <code>store.table.use_schema_file</code> system/session options.</p>
 
 <h2 id="examples">Examples</h2>
 
diff --git a/feed.xml b/feed.xml
index 8622503..9d2201f 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Tue, 30 Apr 2019 17:02:41 -0700</pubDate>
-    <lastBuildDate>Tue, 30 Apr 2019 17:02:41 -0700</lastBuildDate>
+    <pubDate>Wed, 01 May 2019 17:17:31 -0700</pubDate>
+    <lastBuildDate>Wed, 01 May 2019 17:17:31 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>

[drill-site] branch asf-site updated: updates

Reply via email to