[24/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

jbapple Wed, 12 Apr 2017 11:26:02 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_live_summary.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_live_summary.html 
b/docs/build/html/topics/impala_live_summary.html
new file mode 100644
index 0000000..cb41693
--- /dev/null
+++ b/docs/build/html/topics/impala_live_summary.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="live_summary"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>LIVE_SUMMARY Query Option (Impala 2.3 or higher 
only)</title></head><body id="live_summary"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIVE_SUMMARY Query Option 
(<span class="keyword">Impala 2.3</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      For queries submitted through the <span class="keyword 
cmdname">impala-shell</span> command,
+      displays the same output as the <code class="ph codeph">SUMMARY</code> 
command,
+      with the measurements updated in real time as the query progresses.
+      When the query finishes, the final <code class="ph 
codeph">SUMMARY</code> output remains
+      visible in the <span class="keyword cmdname">impala-shell</span> console 
output.
+    </p>
+
+    <p class="p">
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 
and 0, or <code class="ph codeph">true</code> and <code class="ph 
codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph 
codeph">false</code> (shown as 0 in output of <code class="ph 
codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Command-line equivalent:</strong>
+      </p>
+    <p class="p">
+      You can enable this query option within <span class="keyword 
cmdname">impala-shell</span>
+      by starting the shell with the <code class="ph 
codeph">--live_summary</code>
+      command-line option.
+      You can still turn this setting off and on again within the shell 
through the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      The live summary output can be useful for evaluating long-running 
queries,
+      to evaluate which phase of execution takes up the most time, or if some 
hosts
+      take much longer than others for certain operations, dragging overall 
performance down.
+      By making the information available in real time, this feature lets you 
decide what
+      action to take even before you cancel a query that is taking much longer 
than normal.
+    </p>
+    <p class="p">
+      For example, you might see the HDFS scan phase taking a long time, and 
therefore revisit
+      performance-related aspects of your schema design such as constructing a 
partitioned table,
+      switching to the Parquet file format, running the <code class="ph 
codeph">COMPUTE STATS</code> statement
+      for the table, and so on.
+      Or you might see a wide variation between the average and maximum times 
for all hosts to
+      perform some phase of the query, and therefore investigate if one 
particular host
+      needed more memory or was experiencing a network problem.
+    </p>
+    <p class="p">
+        The output from this query option is printed to standard error. The 
output is only displayed in interactive mode,
+        that is, not when the <code class="ph codeph">-q</code> or <code 
class="ph codeph">-f</code> options are used.
+      </p>
+    <p class="p">
+      For a simple and concise way of tracking the progress of an interactive 
query, see
+      <a class="xref" 
href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option 
(Impala 2.3 or higher only)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph 
codeph">LIVE_SUMMARY</code> query options
+        currently do not produce any output during <code class="ph 
codeph">COMPUTE STATS</code> operations.
+      </p>
+    <div class="p">
+        Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code 
class="ph codeph">LIVE_SUMMARY</code> query options
+        are available only within the <span class="keyword 
cmdname">impala-shell</span> interpreter:
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              You cannot change these query options through the SQL <code 
class="ph codeph">SET</code>
+              statement using the JDBC or ODBC interfaces. The <code class="ph 
codeph">SET</code>
+              command in <span class="keyword cmdname">impala-shell</span> 
recognizes these names as
+              shell-only options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Be careful when using <span class="keyword 
cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+              system to connect to a system running <span 
class="keyword">Impala 2.3</span> or higher.
+              The older <span class="keyword cmdname">impala-shell</span> does 
not recognize these
+              query option names. Upgrade <span class="keyword 
cmdname">impala-shell</span> on the
+              systems where you intend to use these query options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Likewise, the <span class="keyword cmdname">impala-shell</span> 
command relies on
+              some information only available in <span class="keyword">Impala 
2.3</span> and higher
+              to prepare live progress reports and query summaries. The
+              <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph 
codeph">LIVE_SUMMARY</code>
+              query options have no effect when <span class="keyword 
cmdname">impala-shell</span> connects
+              to a cluster running an older version of Impala.
+            </p>
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.3.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows a series of <code class="ph 
codeph">LIVE_SUMMARY</code> reports that
+      are displayed during the course of a query, showing how the numbers 
increase to
+      show the progress of different phases of the distributed query. When you 
do the same
+      in <span class="keyword cmdname">impala-shell</span>, only a single 
report is displayed at any one time,
+      with each update overwriting the previous numbers.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set live_summary=true;
+LIVE_SUMMARY set to true
+[localhost:21000] &gt; select count(*) from customer t1 cross join customer t2;
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 0      | 0ns      | 0ns      | 0       | 22.50B     | 
0 B      | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 0      | 0ns      | 0ns      | 0       | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 0      | 0ns      | 0ns      | 0       | 150.00K    | 
0 B      | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 17.62s   | 17.62s   | 81.14M  | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.53ms | 247.53ms | 1.02K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | 
Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 
0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 
20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 61.85s   | 61.85s   | 283.43M | 22.50B     | 
3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 
0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 
24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.59ms | 247.59ms | 2.05K   | 150.00K    | 
24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
+</code></pre>
+
+
+
+
+    <p class="p">
+        To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code 
class="ph codeph">LIVE_SUMMARY</code> query options
+        work in real time, see <a class="xref" 
href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk"; target="_blank">this 
animated demo</a>.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_load_data.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_load_data.html 
b/docs/build/html/topics/impala_load_data.html
new file mode 100644
index 0000000..e49408b
--- /dev/null
+++ b/docs/build/html/topics/impala_load_data.html
@@ -0,0 +1,306 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_langref_sql.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="load_data"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>LOAD DATA Statement</title></head><body 
id="load_data"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LOAD DATA Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">LOAD DATA</code> statement streamlines the 
ETL process for an internal Impala table by moving a
+      data file or all the data files in a directory from an HDFS location 
into the Impala data directory for that
+      table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LOAD DATA INPATH '<var class="keyword 
varname">hdfs_file_or_directory_path</var>' [OVERWRITE] INTO TABLE <var 
class="keyword varname">tablename</var>
+  [PARTITION (<var class="keyword varname">partcol1</var>=<var class="keyword 
varname">val1</var>, <var class="keyword varname">partcol2</var>=<var 
class="keyword varname">val2</var> ...)]</code></pre>
+
+    <p class="p">
+      When the <code class="ph codeph">LOAD DATA</code> statement operates on 
a partitioned table,
+      it always operates on one partition at a time. Specify the <code 
class="ph codeph">PARTITION</code> clauses
+      and list all the partition key columns, with a constant value specified 
for each.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML (but still affected 
by
+        <a class="xref" 
href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query 
option)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        The loaded data files are moved, not copied, into the Impala data 
directory.
+      </li>
+
+      <li class="li">
+        You can specify the HDFS path of a single file to be moved, or the 
HDFS path of a directory to move all the
+        files inside that directory. You cannot specify any sort of wildcard 
to take only some of the files from a
+        directory. When loading a directory full of data files, keep all the 
data files at the top level, with no
+        nested directories underneath.
+      </li>
+
+      <li class="li">
+        Currently, the Impala <code class="ph codeph">LOAD DATA</code> 
statement only imports files from HDFS, not from the local
+        filesystem. It does not support the <code class="ph 
codeph">LOCAL</code> keyword of the Hive <code class="ph codeph">LOAD 
DATA</code>
+        statement. You must specify a path, not an <code class="ph 
codeph">hdfs://</code> URI.
+      </li>
+
+      <li class="li">
+        In the interest of speed, only limited error checking is done. If the 
loaded files have the wrong file
+        format, different columns than the destination table, or other kind of 
mismatch, Impala does not raise any
+        error for the <code class="ph codeph">LOAD DATA</code> statement. 
Querying the table afterward could produce a runtime
+        error or unexpected results. Currently, the only checking the <code 
class="ph codeph">LOAD DATA</code> statement does is
+        to avoid mixing together uncompressed and LZO-compressed text files in 
the same table.
+      </li>
+
+      <li class="li">
+        When you specify an HDFS directory name as the <code class="ph 
codeph">LOAD DATA</code> argument, any hidden files in
+        that directory (files whose names start with a <code class="ph 
codeph">.</code>) are not moved to the Impala data
+        directory.
+      </li>
+
+      <li class="li">
+        The operation fails if the source directory contains any non-hidden 
directories.
+        Prior to <span class="keyword">Impala 2.5</span> if the source 
directory contained any subdirectory, even a hidden one such as
+        <span class="ph filepath">_impala_insert_staging</span>, the <code 
class="ph codeph">LOAD DATA</code> statement would fail.
+        In <span class="keyword">Impala 2.5</span> and higher, <code class="ph 
codeph">LOAD DATA</code> ignores hidden subdirectories in the
+        source directory, and only fails if any of the subdirectories are 
non-hidden.
+      </li>
+
+      <li class="li">
+        The loaded data files retain their original names in the new location, 
unless a name conflicts with an
+        existing data file, in which case the name of the new file is modified 
slightly to be unique. (The
+        name-mangling is a slight difference from the Hive <code class="ph 
codeph">LOAD DATA</code> statement, which replaces
+        identically named files.)
+      </li>
+
+      <li class="li">
+        By providing an easy way to transport files from known locations in 
HDFS into the Impala data directory
+        structure, the <code class="ph codeph">LOAD DATA</code> statement lets 
you avoid memorizing the locations and layout of
+        HDFS directory tree containing the Impala databases and tables. (For a 
quick way to check the location of
+        the data files for an Impala table, issue the statement <code 
class="ph codeph">DESCRIBE FORMATTED
+        <var class="keyword varname">table_name</var></code>.)
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">PARTITION</code> clause is especially 
convenient for ingesting new data for a partitioned
+        table. As you receive new data for a time period, geographic region, 
or other division that corresponds to
+        one or more partitioning columns, you can load that data straight into 
the appropriate Impala data
+        directory, which might be nested several levels down if the table is 
partitioned by multiple columns. When
+        the table is partitioned, you must specify constant values for all the 
partitioning columns.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Impala currently cannot create Parquet data files containing 
complex types
+      (<code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), the
+      <code class="ph codeph">LOAD DATA</code> statement is especially 
important when working with
+      tables containing complex type columns. You create the Parquet data 
files outside
+      Impala, then use either <code class="ph codeph">LOAD DATA</code>, an 
external table, or HDFS-level
+      file operations followed by <code class="ph codeph">REFRESH</code> to 
associate the data files with
+      the corresponding table.
+      See <a class="xref" 
href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or 
higher only)</a> for details about using complex types.
+    </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span 
class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph 
codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has 
been received by all the Impala
+        nodes. See <a class="xref" 
href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query 
Option</a> for details.
+      </p>
+
+    <div class="note important note_important"><span class="note__title 
importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical 
queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. 
Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph 
codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive 
and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This 
technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      First, we use a trivial Python script to write different numbers of 
strings (one per line) into files stored
+      in the <code class="ph codeph">doc_demo</code> HDFS user account. 
(Substitute the path for your own HDFS user account when
+      doing <span class="keyword cmdname">hdfs dfs</span> operations like 
these.)
+    </p>
+
+<pre class="pre codeblock"><code>$ random_strings.py 1000 | hdfs dfs -put - 
/user/doc_demo/thousand_strings.txt
+$ random_strings.py 100 | hdfs dfs -put - /user/doc_demo/hundred_strings.txt
+$ random_strings.py 10 | hdfs dfs -put - 
/user/doc_demo/ten_strings.txt</code></pre>
+
+    <p class="p">
+      Next, we create a table and load an initial set of data into it. 
Remember, unless you specify a
+      <code class="ph codeph">STORED AS</code> clause, Impala tables default 
to <code class="ph codeph">TEXTFILE</code> format with Ctrl-A (hex
+      01) as the field delimiter. This example uses a single-column table, so 
the delimiter is not significant. For
+      large-scale ETL jobs, you would typically use binary format data files 
such as Parquet or Avro, and load them
+      into Impala tables that use the corresponding file format.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (s 
string);
+[localhost:21000] &gt; load data inpath '/user/doc_demo/thousand_strings.txt' 
into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.61s
+[kilo2-202-961.cs1cloud.internal:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1000 |
++------+
+Returned 1 row(s) in 0.67s
+[localhost:21000] &gt; load data inpath '/user/doc_demo/thousand_strings.txt' 
into table t1;
+ERROR: AnalysisException: INPATH location 
'/user/doc_demo/thousand_strings.txt' does not exist. </code></pre>
+
+    <p class="p">
+      As indicated by the message at the end of the previous example, the data 
file was moved from its original
+      location. The following example illustrates how the data file was moved 
into the Impala data directory for
+      the destination table, keeping its original filename:
+    </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -ls 
/user/hive/warehouse/load_data_testing.db/t1
+Found 1 items
+-rw-r--r--   1 doc_demo doc_demo      13926 2013-06-26 15:40 
/user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</code></pre>
+
+    <p class="p">
+      The following example demonstrates the difference between the <code 
class="ph codeph">INTO TABLE</code> and
+      <code class="ph codeph">OVERWRITE TABLE</code> clauses. The table 
already contains 1000 rows. After issuing the
+      <code class="ph codeph">LOAD DATA</code> statement with the <code 
class="ph codeph">INTO TABLE</code> clause, the table contains 100 more
+      rows, for a total of 1100. After issuing the <code class="ph 
codeph">LOAD DATA</code> statement with the <code class="ph codeph">OVERWRITE
+      INTO TABLE</code> clause, the former contents are gone, and now the 
table only contains the 10 rows from
+      the just-loaded data file.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; load data inpath 
'/user/doc_demo/hundred_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.24s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1100 |
++------+
+Returned 1 row(s) in 0.55s
+[localhost:21000] &gt; load data inpath '/user/doc_demo/ten_strings.txt' 
overwrite into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++-----+
+| _c0 |
++-----+
+| 10  |
++-----+
+Returned 1 row(s) in 0.62s</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML 
statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD 
DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write 
data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, 
because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> 
prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph 
codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms 
instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table 
before using Impala to query the S3 data.
+      </p>
+    <p class="p">
+        Because of differences between S3 and traditional filesystems, DML 
operations
+        for S3 tables can take longer than for tables on HDFS. For example, 
both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage 
of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements 
involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and 
<code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final 
destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> 
operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then 
removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph 
codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 
tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an 
inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> 
or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" 
href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING
 Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+    <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala 
with the Amazon S3 Filesystem</a> for details about reading and writing S3 data 
with Impala.</p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon 
runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read 
and write
+      permissions for the files in the source directory, and write
+      permission for the destination directory.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used 
with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used 
with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">LOAD DATA</code> statement is an alternative 
to the
+      <code class="ph codeph"><a class="xref" 
href="impala_insert.html#insert">INSERT</a></code> statement.
+      Use <code class="ph codeph">LOAD DATA</code>
+      when you have the data files in HDFS but outside of any Impala table.
+    </p>
+    <p class="p">
+      The <code class="ph codeph">LOAD DATA</code> statement is also an 
alternative
+      to the <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement. 
Use
+      <code class="ph codeph">LOAD DATA</code> when it is appropriate to move 
the
+      data files under Impala control rather than querying them
+      from their original location. See <a class="xref" 
href="impala_tables.html#external_tables">External Tables</a>
+      for information about working with external tables.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_langref_sql.html">Impala SQL 
Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_logging.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_logging.html 
b/docs/build/html/topics/impala_logging.html
new file mode 100644
index 0000000..049f370
--- /dev/null
+++ b/docs/build/html/topics/impala_logging.html
@@ -0,0 +1,416 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta 
name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta 
name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta 
name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta 
name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta 
name="version" content="Impala 2.8.x"><meta name="version" content="Impala 
2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="version" content="Impala 2.8.x"><meta name="version" content="Impala 
2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" 
content="Impa
 la 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="logging"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Using Impala Logging</title></head><body 
id="logging"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala Logging</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala logs record information about:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Any errors Impala encountered. If Impala experienced a serious error 
during startup, you must diagnose and
+        troubleshoot that problem before you can do anything further with 
Impala.
+      </li>
+
+      <li class="li">
+        How Impala is configured.
+      </li>
+
+      <li class="li">
+        Jobs Impala has completed.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      <p class="p">
+        Formerly, the logs contained the query profile for each query, showing 
low-level details of how the work is
+        distributed among nodes and how intermediate and final results are 
transmitted across the network. To save
+        space, those query profiles are now stored in zlib-compressed files in
+        <span class="ph filepath">/var/log/impala/profiles</span>. You can 
access them through the Impala web user interface.
+        For example, at <code class="ph codeph">http://<var class="keyword 
varname">impalad-node-hostname</var>:25000/queries</code>, each query
+        is followed by a <code class="ph codeph">Profile</code> link leading 
to a page showing extensive analytical data for the
+        query execution.
+      </p>
+
+      <p class="p">
+        The auditing feature introduced in Impala 1.1.1 produces a separate 
set of audit log files when
+        enabled. See <a class="xref" 
href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+      </p>
+
+      <p class="p">
+        The lineage feature introduced in Impala 2.2.0 produces a separate 
lineage log file when
+        enabled. See <a class="xref" 
href="impala_lineage.html#lineage">Viewing Lineage Information for Impala 
Data</a> for details.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" 
id="logging__logs_details">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Locations and Names of 
Impala Log Files</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          By default, the log files are under the directory <span class="ph 
filepath">/var/log/impala</span>.
+          To change log file locations, modify the defaults file described in
+          <a class="xref" href="impala_processes.html#processes">Starting 
Impala</a>.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">impalad</code> 
process are <span class="ph filepath">impalad.INFO</span>,
+          <span class="ph filepath">impalad.WARNING</span>, and <span 
class="ph filepath">impalad.ERROR</span>. You might also see a file
+          <span class="ph filepath">impalad.FATAL</span>, although this is 
only present in rare conditions.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph 
codeph">statestored</code> process are
+          <span class="ph filepath">statestored.INFO</span>, <span class="ph 
filepath">statestored.WARNING</span>, and
+          <span class="ph filepath">statestored.ERROR</span>. You might also 
see a file <span class="ph filepath">statestored.FATAL</span>,
+          although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph 
codeph">catalogd</code> process are <span class="ph 
filepath">catalogd.INFO</span>,
+          <span class="ph filepath">catalogd.WARNING</span>, and <span 
class="ph filepath">catalogd.ERROR</span>. You might also see a file
+          <span class="ph filepath">catalogd.FATAL</span>, although this is 
only present in rare conditions.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.INFO</code> files to see 
configuration settings for the processes.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.WARNING</code> files to see all 
kinds of problem information, including such
+          things as suboptimal settings and also serious runtime errors.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.ERROR</code> and/or <code 
class="ph codeph">.FATAL</code> files to see only the most serious
+          errors, if the processes crash, or queries fail to complete. These 
messages are also in the
+          <code class="ph codeph">.WARNING</code> file.
+        </li>
+
+        <li class="li">
+          A new set of log files is produced each time the associated daemon 
is restarted. These log files have
+          long names including a timestamp. The <code class="ph 
codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and
+          <code class="ph codeph">.ERROR</code> files are physically 
represented as symbolic links to the latest applicable log
+          files.
+        </li>
+
+        <li class="li">
+          The init script for the <code class="ph codeph">impala-server</code> 
service also produces a consolidated log file
+          <code class="ph codeph">/var/logs/impalad/impala-server.log</code>, 
with all the same information as the
+          corresponding<code class="ph codeph">.INFO</code>, <code class="ph 
codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+        </li>
+
+        <li class="li">
+          The init script for the <code class="ph 
codeph">impala-state-store</code> service also produces a consolidated log file
+          <code class="ph 
codeph">/var/logs/impalad/impala-state-store.log</code>, with all the same 
information as the
+          corresponding<code class="ph codeph">.INFO</code>, <code class="ph 
codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+        </li>
+      </ul>
+
+      <p class="p">
+        Impala stores information using the <code class="ph 
codeph">glog_v</code> logging system. You will see some messages
+        referring to C++ file names. Logging is affected by:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">GLOG_v</code> environment variable 
specifies which types of messages are logged. See
+          <a class="xref" href="#log_levels">Setting Logging Levels</a> for 
details.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">-logbuflevel</code> startup flag for the 
<span class="keyword cmdname">impalad</span> daemon specifies how
+          often the log information is written to disk. The default is 0, 
meaning that the log is immediately
+          flushed to disk when Impala outputs an important messages such as a 
warning or an error, but less
+          important messages such as informational ones are buffered in memory 
rather than being flushed to disk
+          immediately.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="logging__logs_managing">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Managing Impala Logs</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Review Impala log files on each host, when you have traced an issue 
back to a specific system.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" 
id="logging__logs_rotate">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Rotating Impala Logs</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala periodically switches the physical files representing the 
current log files, after which it is safe
+        to remove the old files if they are no longer needed.
+      </p>
+
+      <p class="p">
+        Impala can automatically remove older unneeded log files, a feature 
known as <dfn class="term">log rotation</dfn>.
+
+      </p>
+
+      <p class="p">
+        In Impala 2.2 and higher, the <code class="ph 
codeph">-max_log_files</code> configuration option specifies how many log
+        files to keep at each severity level. You can specify an appropriate 
setting for each Impala-related daemon
+        (<span class="keyword cmdname">impalad</span>, <span class="keyword 
cmdname">statestored</span>, and <span class="keyword 
cmdname">catalogd</span>). The default
+        value is 10, meaning that Impala preserves the latest 10 log files for 
each severity level
+        (<code class="ph codeph">INFO</code>, <code class="ph 
codeph">WARNING</code>, <code class="ph codeph">ERROR</code>, and <code 
class="ph codeph">FATAL</code>).
+        Impala checks to see if any old logs need to be removed based on the 
interval specified in the
+        <code class="ph codeph">logbufsecs</code> setting, every 5 seconds by 
default.
+      </p>
+
+
+
+      <p class="p">
+        A value of 0 preserves all log files, in which case you would set up 
set up manual log rotation using your
+        Linux tool or technique of choice. A value of 1 preserves only the 
very latest log file.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" 
id="logging__logs_debug">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Reviewing Impala Logs</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, the Impala log is stored at <code class="ph 
codeph">/var/logs/impalad/</code>. The most comprehensive log,
+        showing informational, warning, and error messages, is in the file 
name <span class="ph filepath">impalad.INFO</span>.
+        View log file contents by using the web interface or by examining the 
contents of the log file. (When you
+        examine the logs through the file system, you can troubleshoot 
problems by reading the
+        <span class="ph filepath">impalad.WARNING</span> and/or <span 
class="ph filepath">impalad.ERROR</span> files, which contain the
+        subsets of messages indicating potential problems.)
+      </p>
+
+      <p class="p">
+        On a machine named <code class="ph codeph">impala.example.com</code> 
with default settings, you could view the Impala
+        logs on that machine by using a browser to access <code class="ph 
codeph">http://impala.example.com:25000/logs</code>.
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        <p class="p">
+          The web interface limits the amount of logging information 
displayed. To view every log entry, access the
+          log files directly through the file system.
+        </p>
+      </div>
+
+      <p class="p">
+        You can view the contents of the <code class="ph 
codeph">impalad.INFO</code> log file in the file system. With the
+        default configuration settings, the start of the log file appears as 
follows:
+      </p>
+
+<pre class="pre codeblock"><code>[user@example impalad]$ pwd
+/var/log/impalad
+[user@example impalad]$ more impalad.INFO
+Log file created at: 2013/01/07 08:42:12
+Running on machine: impala.example.com
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 
9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
+Built on Fri, 21 Dec 2012 12:55:19 PST
+I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
+I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug 
webserver):
+--dump_ir=false
+--module_output=
+--be_port=22000
+--classpath=
+--hostname=impala.example.com</code></pre>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        The preceding example shows only a small part of the log file. Impala 
log files are often several megabytes
+        in size.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" 
id="logging__log_format">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Understanding Impala Log 
Contents</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The logs store information about Impala startup options. This 
information appears once for each time Impala
+        is started and may include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Machine name.
+        </li>
+
+        <li class="li">
+          Impala version number.
+        </li>
+
+        <li class="li">
+          Flags used to start Impala.
+        </li>
+
+        <li class="li">
+          CPU information.
+        </li>
+
+        <li class="li">
+          The number of available disks.
+        </li>
+      </ul>
+
+      <p class="p">
+        There is information about each job Impala has run. Because each 
Impala job creates an additional set of
+        data about queries, the amount of job specific data may be very large. 
Logs may contained detailed
+        information on jobs. These detailed log entries may include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The composition of the query.
+        </li>
+
+        <li class="li">
+          The degree of data locality.
+        </li>
+
+        <li class="li">
+          Statistics on data throughput and response times.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" 
id="logging__log_levels">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Setting Logging 
Levels</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala uses the GLOG system, which supports three logging levels. You 
can adjust logging levels
+        by exporting variable settings. To change logging settings manually, 
use a command
+        similar to the following on each node before starting <code class="ph 
codeph">impalad</code>:
+      </p>
+
+<pre class="pre codeblock"><code>export GLOG_v=1</code></pre>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        For performance reasons, do not enable the most verbose logging level 
of 3 unless there is
+        no other alternative for troubleshooting.
+      </div>
+
+      <p class="p">
+        For more information on how to configure GLOG, including how to set 
variable logging levels for different
+        system components, see
+        <a class="xref" 
href="http://google-glog.googlecode.com/svn/trunk/doc/glog.html"; 
target="_blank">How
+        To Use Google Logging Library (glog)</a>.
+      </p>
+
+      <section class="section" id="log_levels__loglevels_details"><h3 
class="title sectiontitle">Understanding What is Logged at Different Logging 
Levels</h3>
+
+        
+
+        <p class="p">
+          As logging levels increase, the categories of information logged are 
cumulative. For example, GLOG_v=2
+          records everything GLOG_v=1 records, as well as additional 
information.
+        </p>
+
+        <p class="p">
+          Increasing logging levels imposes performance overhead and increases 
log size. Where practical, use
+          GLOG_v=1 for most cases: this level has minimal performance impact 
but still captures useful
+          troubleshooting information.
+        </p>
+
+        <p class="p">
+          Additional information logged at each level is as follows:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            GLOG_v=1 - The default level. Logs information about each 
connection and query that is initiated to an
+            <code class="ph codeph">impalad</code> instance, including runtime 
profiles.
+          </li>
+
+          <li class="li">
+            GLOG_v=2 - Everything from the previous level plus information for 
each RPC initiated. This level also
+            records query execution progress information, including details on 
each file that is read.
+          </li>
+
+          <li class="li">
+            GLOG_v=3 - Everything from the previous level plus logging of 
every row that is read. This level is
+            only applicable for the most serious troubleshooting and tuning 
scenarios, because it can produce
+            exceptionally large and detailed log files, potentially leading to 
its own set of performance and
+            capacity problems.
+          </li>
+        </ul>
+
+      </section>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" 
id="logging__redaction">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Redacting Sensitive 
Information from Impala Log Files</h2>
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        <dfn class="term">Log redaction</dfn> is a security feature that 
prevents sensitive information from being displayed in
+        locations used by administrators for monitoring and troubleshooting, 
such as log files and the Impala debug web
+        user interface. You configure regular expressions that match sensitive 
types of information processed by your
+        system, such as credit card numbers or tax IDs, and literals matching 
these patterns are obfuscated wherever
+        they would normally be recorded in log files or displayed in 
administration or debugging user interfaces.
+      </p>
+
+      <p class="p">
+        In a security context, the log redaction feature is complementary to 
the Sentry authorization framework.
+        Sentry prevents unauthorized users from being able to directly access 
table data. Redaction prevents
+        administrators or support personnel from seeing the smaller amounts of 
sensitive or personally identifying
+        information (PII) that might appear in queries issued by those 
authorized users.
+      </p>
+
+      <p class="p">
+        See <span class="xref">the documentation for your Apache Hadoop 
distribution</span> for details about how to enable this feature and set
+        up the regular expressions to detect and redact sensitive information 
within SQL statement text.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_map.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_map.html 
b/docs/build/html/topics/impala_map.html
new file mode 100644
index 0000000..31d91f0
--- /dev/null
+++ b/docs/build/html/topics/impala_map.html
@@ -0,0 +1,331 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_datatypes.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="map"><link 
rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAP Complex 
Type (Impala 2.3 or higher only)</title></head><body id="map"><main 
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+    <h1 class="title topictitle1" id="ariaid-title1">MAP Complex Type (<span 
class="keyword">Impala 2.3</span> or higher only)</h1>
+
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        A complex data type representing an arbitrary set of key-value pairs.
+        The key part is a scalar type, while the value part can be a scalar or
+        another complex type (<code class="ph codeph">ARRAY</code>, <code 
class="ph codeph">STRUCT</code>,
+        or <code class="ph codeph">MAP</code>).
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword 
varname">column_name</var> MAP &lt; <var class="keyword 
varname">primitive_type</var>, <var class="keyword varname">type</var> &gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var 
class="keyword varname">complex_type</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph 
codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" 
href="../shared/../topics/impala_complex_types.html#complex_types">Complex 
Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">MAP</code> complex data type represents a 
set of key-value pairs.
+        Each element of the map is indexed by a primitive type such as <code 
class="ph codeph">BIGINT</code> or
+        <code class="ph codeph">STRING</code>, letting you define sequences 
that are not continuous or categories with arbitrary names.
+        You might find it convenient for modelling data produced in other 
languages, such as a
+        Python dictionary or Java HashMap, where a single scalar value serves 
as the lookup key.
+      </p>
+
+      <p class="p">
+        In a big data context, the keys in a map column might represent a 
numeric sequence of events during a
+        manufacturing process, or <code class="ph codeph">TIMESTAMP</code> 
values corresponding to sensor observations.
+        The map itself is inherently unordered, so you choose whether to make 
the key values significant
+        (such as a recorded <code class="ph codeph">TIMESTAMP</code>) or 
synthetic (such as a random global universal ID).
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        Behind the scenes, the <code class="ph codeph">MAP</code> type is 
implemented in a similar way as the
+        <code class="ph codeph">ARRAY</code> type. Impala does not enforce any 
uniqueness constraint on the
+        <code class="ph codeph">KEY</code> values, and the <code class="ph 
codeph">KEY</code> values are processed by
+        looping through the elements of the <code class="ph codeph">MAP</code> 
rather than by a constant-time lookup.
+        Therefore, this type is primarily for ease of understanding when 
importing data and
+        algorithms from non-SQL contexts, rather than optimizing the 
performance of key lookups.
+      </div>
+
+      <p class="p">
+        You can pass a multi-part qualified name to <code class="ph 
codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an 
<code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code 
class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph 
codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code 
class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE 
t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column 
table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph 
codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each 
field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, 
with
+        <code class="ph codeph">KEY</code> and <code class="ph 
codeph">VALUE</code> columns.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.3.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or 
partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key 
columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does 
not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="map__d6e2889">
+            The maximum length of the column definition for any complex type, 
including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" 
href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations
 and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, 
<code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, 
<code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used 
with Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph 
codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" 
href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample
 Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+      <p class="p">
+        The following example shows a table with various kinds of <code 
class="ph codeph">MAP</code> columns,
+        both at the top level and nested within other complex types.
+        Each row represents information about a specific country, with complex 
type fields
+        of various levels of nesting to represent different information 
associated
+        with the country: factual measurements such as area and population,
+        notable people in different categories, geographic features such as
+        cities, points of interest within each city, and mountains with 
associated facts.
+        Practice the <code class="ph codeph">CREATE TABLE</code> and query 
notation for complex type columns
+        using empty tables, until you can visualize a complex data structure 
and construct corresponding SQL statements reliably.
+      </p>
+
+<pre class="pre codeblock"><code>create TABLE map_demo
+(
+  country_id BIGINT,
+
+-- Numeric facts about each country, looked up by name.
+-- For example, 'Area':1000, 'Population':999999.
+-- Using a MAP instead of a STRUCT because there could be
+-- a different set of facts for each country.
+  metrics MAP &lt;STRING, BIGINT&gt;,
+
+-- MAP whose value part is an ARRAY.
+-- For example, the key 'Famous Politicians' could represent an array of 10 
elements,
+-- while the key 'Famous Actors' could represent an array of 20 elements.
+  notables MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;,
+
+-- MAP that is a field within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+-- For example, city #1 might have points of interest with key 'Zoo',
+-- representing an array of 3 different zoos.
+-- City #2 might have completely different kinds of points of interest.
+-- Because the set of field names is potentially large, and most entries could 
be blank,
+-- a MAP makes more sense than a STRUCT to represent such a sparse data 
structure.
+  cities ARRAY &lt; STRUCT &lt;
+    name: STRING,
+    points_of_interest: MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;
+  &gt;&gt;,
+
+-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to 
associate
+-- the mountain name with all the facts about the mountain.
+-- The "key" of the map (the first STRING field) represents the name of some 
fact whose value
+-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and 
so on.
+  mountains ARRAY &lt; STRUCT &lt; name: STRING, facts: MAP &lt;STRING, INT 
&gt; &gt; &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>DESCRIBE map_demo;
++------------+------------------------------------------------+
+| name       | type                                           |
++------------+------------------------------------------------+
+| country_id | bigint                                         |
+| metrics    | map&lt;string,bigint&gt;                             |
+| notables   | map&lt;string,array&lt;string&gt;&gt;                      |
+| cities     | array&lt;struct&lt;                                  |
+|            |   name:string,                                 |
+|            |   points_of_interest:map&lt;string,array&lt;string&gt;&gt; |
+|            | &gt;&gt;                                             |
+| mountains  | array&lt;struct&lt;                                  |
+|            |   name:string,                                 |
+|            |   facts:map&lt;string,int&gt;                        |
+|            | &gt;&gt;                                             |
++------------+------------------------------------------------+
+
+DESCRIBE map_demo.metrics;
++-------+--------+
+| name  | type   |
++-------+--------+
+| key   | string |
+| value | bigint |
++-------+--------+
+
+DESCRIBE map_demo.notables;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+DESCRIBE map_demo.notables.value;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE map_demo.cities;
++------+------------------------------------------------+
+| name | type                                           |
++------+------------------------------------------------+
+| item | struct&lt;                                        |
+|      |   name:string,                                 |
+|      |   points_of_interest:map&lt;string,array&lt;string&gt;&gt; |
+|      | &gt;                                              |
+| pos  | bigint                                         |
++------+------------------------------------------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest.value;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE map_demo.mountains;
++------+-------------------------+
+| name | type                    |
++------+-------------------------+
+| item | struct&lt;                 |
+|      |   name:string,          |
+|      |   facts:map&lt;string,int&gt; |
+|      | &gt;                       |
+| pos  | bigint                  |
++------+-------------------------+
+
+DESCRIBE map_demo.mountains.item.facts;
++-------+--------+
+| name  | type   |
++-------+--------+
+| key   | string |
+| value | int    |
++-------+--------+
+
+</code></pre>
+
+      <p class="p">
+        The following example shows a table that uses a variety of data types 
for the <code class="ph codeph">MAP</code>
+        <span class="q">"key"</span> field. Typically, you use <code class="ph 
codeph">BIGINT</code> or <code class="ph codeph">STRING</code> to use
+        numeric or character-based keys without worrying about exceeding any 
size or length constraints.
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE map_demo_obscure
+(
+  id BIGINT,
+  m1 MAP &lt;INT, INT&gt;,
+  m2 MAP &lt;SMALLINT, INT&gt;,
+  m3 MAP &lt;TINYINT, INT&gt;,
+  m4 MAP &lt;TIMESTAMP, INT&gt;,
+  m5 MAP &lt;BOOLEAN, INT&gt;,
+  m6 MAP &lt;CHAR(5), INT&gt;,
+  m7 MAP &lt;VARCHAR(25), INT&gt;,
+  m8 MAP &lt;FLOAT, INT&gt;,
+  m9 MAP &lt;DOUBLE, INT&gt;,
+  m10 MAP &lt;DECIMAL(12,2), INT&gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>CREATE TABLE celebrities (name STRING, 
birth_year MAP &lt; STRING, SMALLINT &gt;) STORED AS PARQUET;
+-- A typical row might represent values with 2 different birth years, such as:
+-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 })
+
+CREATE TABLE countries (name STRING, famous_leaders MAP &lt; INT, STRING &gt;) 
STORED AS PARQUET;
+-- A typical row might represent values with different leaders, with key 
values corresponding to their numeric sequence, such as:
+-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: 
"Abraham Lincoln" })
+
+CREATE TABLE airlines (name STRING, special_meals MAP &lt; STRING, MAP &lt; 
STRING, STRING &gt; &gt;) STORED AS PARQUET;
+-- A typical row might represent values with multiple kinds of meals, each 
with several components:
+-- ("Elegant Airlines",
+--   {
+--     "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": 
"rice pilaf" },
+--     "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": 
"chicken" }
+--   } )
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_complex_types.html#complex_types">Complex 
Types (Impala 2.3 or higher only)</a>,
+        <a class="xref" href="impala_array.html#array">ARRAY Complex Type 
(Impala 2.3 or higher only)</a>,
+        <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type 
(Impala 2.3 or higher only)</a>
+        
+      </p>
+
+    </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_datatypes.html">Data 
Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[24/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Reply via email to