[47/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

jbapple Wed, 12 Apr 2017 11:25:32 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_appx_median.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_appx_median.html 
b/docs/build/html/topics/impala_appx_median.html
new file mode 100644
index 0000000..1883f2c
--- /dev/null
+++ b/docs/build/html/topics/impala_appx_median.html
@@ -0,0 +1,127 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_aggregate_functions.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="appx_median"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>APPX_MEDIAN Function</title></head><body 
id="appx_median"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">APPX_MEDIAN Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns a value that is approximately the 
median (midpoint) of values in the set
+      of input values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>APPX_MEDIAN([DISTINCT | ALL] <var 
class="keyword varname">expression</var>)
+</code></pre>
+
+    <p class="p">
+      This function works with any input type, because the only requirement is 
that the type supports less-than and
+      greater-than comparison operators.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Because the return value represents the estimated midpoint, it might not 
reflect the precise midpoint value,
+      especially if the cardinality of the input values is very high. If the 
cardinality is low (up to
+      approximately 20,000), the result is more accurate because the sampling 
considers all or almost all of the
+      different values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, 
except for <code class="ph codeph">CHAR</code> and <code class="ph 
codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+      The return value is always the same as one of the input values, not an 
<span class="q">"in-between"</span> value produced by
+      averaging.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the 
<code class="ph codeph">OVER()</code> clause is not allowed at all with this 
function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example uses a table of a million random floating-point 
numbers ranging up to approximately
+      50,000. The average is approximately 25,000. Because of the random 
distribution, we would expect the median
+      to be close to this same number. Computing the precise median is a more 
intensive operation than computing
+      the average, because it requires keeping track of every distinct value 
and how many times each occurs. The
+      <code class="ph codeph">APPX_MEDIAN()</code> function uses a sampling 
algorithm to return an approximate result, which in
+      this case is close to the expected value. To make sure that the value is 
not substantially out of range due
+      to a skewed distribution, subsequent queries confirm that there are 
approximately 500,000 values higher than
+      the <code class="ph codeph">APPX_MEDIAN()</code> value, and 
approximately 500,000 values lower than the
+      <code class="ph codeph">APPX_MEDIAN()</code> value.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select min(x), max(x), 
avg(x) from million_numbers;
++-------------------+-------------------+-------------------+
+| min(x)            | max(x)            | avg(x)            |
++-------------------+-------------------+-------------------+
+| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |
++-------------------+-------------------+-------------------+
+[localhost:21000] &gt; select appx_median(x) from million_numbers;
++----------------+
+| appx_median(x) |
++----------------+
+| 24721.6        |
++----------------+
+[localhost:21000] &gt; select count(x) as higher from million_numbers where x 
&gt; (select appx_median(x) from million_numbers);
++--------+
+| higher |
++--------+
+| 502013 |
++--------+
+[localhost:21000] &gt; select count(x) as lower from million_numbers where x 
&lt; (select appx_median(x) from million_numbers);
++--------+
+| lower  |
++--------+
+| 497987 |
++--------+
+</code></pre>
+
+    <p class="p">
+      The following example computes the approximate median using a subset of 
the values from the table, and then
+      confirms that the result is a reasonable estimate for the midpoint.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select appx_median(x) 
from million_numbers where x between 1000 and 5000;
++-------------------+
+| appx_median(x)    |
++-------------------+
+| 3013.107787358159 |
++-------------------+
+[localhost:21000] &gt; select count(x) as higher from million_numbers where x 
between 1000 and 5000 and x &gt; 3013.107787358159;
++--------+
+| higher |
++--------+
+| 37692  |
++--------+
+[localhost:21000] &gt; select count(x) as lower from million_numbers where x 
between 1000 and 5000 and x &lt; 3013.107787358159;
++-------+
+| lower |
++-------+
+| 37089 |
++-------+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_aggregate_functions.html">Impala Aggregate 
Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_array.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_array.html 
b/docs/build/html/topics/impala_array.html
new file mode 100644
index 0000000..45c9a42
--- /dev/null
+++ b/docs/build/html/topics/impala_array.html
@@ -0,0 +1,321 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_datatypes.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="array"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>ARRAY Complex Type (Impala 2.3 or higher 
only)</title></head><body id="array"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ARRAY Complex Type (<span 
class="keyword">Impala 2.3</span> or higher only)</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A complex data type that can represent an arbitrary number of ordered 
elements.
+      The elements can be scalars or another complex type (<code class="ph 
codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, or <code class="ph 
codeph">MAP</code>).
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+
+
+<pre class="pre codeblock"><code><var class="keyword 
varname">column_name</var> ARRAY &lt; <var class="keyword varname">type</var> 
&gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var 
class="keyword varname">complex_type</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph 
codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" 
href="../shared/../topics/impala_complex_types.html#complex_types">Complex 
Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+      <p class="p">
+        The elements of the array have no names. You refer to the value of the 
array item using the
+        <code class="ph codeph">ITEM</code> pseudocolumn, or its position in 
the array with the <code class="ph codeph">POS</code>
+        pseudocolumn. See <a class="xref" 
href="impala_complex_types.html#item">ITEM and POS Pseudocolumns</a> for 
information about
+        these pseudocolumns.
+      </p>
+
+
+
+    <p class="p">
+      Each row can have a different number of elements (including none) in the 
array for that row.
+    </p>
+
+
+
+      <p class="p">
+        When an array contains items of scalar types, you can use aggregation 
functions on the array elements without using join notation. For
+        example, you can find the <code class="ph codeph">COUNT()</code>, 
<code class="ph codeph">AVG()</code>, <code class="ph codeph">SUM()</code>, and 
so on of numeric array
+        elements, or the <code class="ph codeph">MAX()</code> and <code 
class="ph codeph">MIN()</code> of any scalar array elements by referring to
+        <code class="ph codeph"><var class="keyword 
varname">table_name</var>.<var class="keyword 
varname">array_column</var></code> in the <code class="ph codeph">FROM</code> 
clause of the query. When
+        you need to cross-reference values from the array with scalar values 
from the same row, such as by including a <code class="ph codeph">GROUP
+        BY</code> clause to produce a separate aggregated result for each row, 
then the join clause is required.
+      </p>
+
+      <p class="p">
+        A common usage pattern with complex types is to have an array as the 
top-level type for the column:
+        an array of structs, an array of maps, or an array of arrays.
+        For example, you can model a denormalized table by creating a column 
that is an <code class="ph codeph">ARRAY</code>
+        of <code class="ph codeph">STRUCT</code> elements; each item in the 
array represents a row from a table that would
+        normally be used in a join query. This kind of data structure lets you 
essentially denormalize tables by
+        associating multiple rows from one table with the matching row in 
another table.
+      </p>
+
+      <p class="p">
+        You typically do not create more than one top-level <code class="ph 
codeph">ARRAY</code> column, because if there is
+        some relationship between the elements of multiple arrays, it is 
convenient to model the data as
+        an array of another complex type element (either <code class="ph 
codeph">STRUCT</code> or <code class="ph codeph">MAP</code>).
+      </p>
+
+      <p class="p">
+        You can pass a multi-part qualified name to <code class="ph 
codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an 
<code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code 
class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph 
codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code 
class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE 
t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column 
table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph 
codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each 
field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, 
with
+        <code class="ph codeph">KEY</code> and <code class="ph 
codeph">VALUE</code> columns.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.3.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or 
partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key 
columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does 
not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="array__d6e2889">
+            The maximum length of the column definition for any complex type, 
including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" 
href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations
 and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, 
<code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, 
<code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used 
with Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph 
codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" 
href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample
 Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+      <p class="p">
+        The following example shows how to construct a table with various 
kinds of <code class="ph codeph">ARRAY</code> columns,
+        both at the top level and nested within other complex types.
+        Whenever the <code class="ph codeph">ARRAY</code> consists of a scalar 
value, such as in the <code class="ph codeph">PETS</code>
+        column or the <code class="ph codeph">CHILDREN</code> field, you can 
see that future expansion is limited.
+        For example, you could not easily evolve the schema to record the kind 
of pet or the child's birthday alongside the name.
+        Therefore, it is more common to use an <code class="ph 
codeph">ARRAY</code> whose elements are of <code class="ph 
codeph">STRUCT</code> type,
+        to associate multiple fields with each array element.
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        Practice the <code class="ph codeph">CREATE TABLE</code> and query 
notation for complex type columns
+        using empty tables, until you can visualize a complex data structure 
and construct corresponding SQL statements reliably.
+      </div>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE array_demo
+(
+  id BIGINT,
+  name STRING,
+-- An ARRAY of scalar type as a top-level column.
+  pets ARRAY &lt;STRING&gt;,
+
+-- An ARRAY with elements of complex type (STRUCT).
+  places_lived ARRAY &lt; STRUCT &lt;
+    place: STRING,
+    start_year: INT
+  &gt;&gt;,
+
+-- An ARRAY as a field (CHILDREN) within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+  marriages ARRAY &lt; STRUCT &lt;
+    spouse: STRING,
+    children: ARRAY &lt;STRING&gt;
+  &gt;&gt;,
+
+-- An ARRAY as the value part of a MAP.
+-- The first MAP field (the key) would be a value such as
+-- 'Parent' or 'Grandparent', and the corresponding array would
+-- represent 2 parents, 4 grandparents, and so on.
+  ancestors MAP &lt; STRING, ARRAY &lt;STRING&gt; &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how to examine the structure of a table 
containing one or more <code class="ph codeph">ARRAY</code> columns by using the
+      <code class="ph codeph">DESCRIBE</code> statement. You can visualize 
each <code class="ph codeph">ARRAY</code> as its own two-column table, with 
columns
+      <code class="ph codeph">ITEM</code> and <code class="ph 
codeph">POS</code>.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>DESCRIBE array_demo;
++--------------+---------------------------+
+| name         | type                      |
++--------------+---------------------------+
+| id           | bigint                    |
+| name         | string                    |
+| pets         | array&lt;string&gt;             |
+| marriages    | array&lt;struct&lt;             |
+|              |   spouse:string,          |
+|              |   children:array&lt;string&gt;  |
+|              | &gt;&gt;                        |
+| places_lived | array&lt;struct&lt;             |
+|              |   place:string,           |
+|              |   start_year:int          |
+|              | &gt;&gt;                        |
+| ancestors    | map&lt;string,array&lt;string&gt;&gt; |
++--------------+---------------------------+
+
+DESCRIBE array_demo.pets;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE array_demo.marriages;
++------+--------------------------+
+| name | type                     |
++------+--------------------------+
+| item | struct&lt;                  |
+|      |   spouse:string,         |
+|      |   children:array&lt;string&gt; |
+|      | &gt;                        |
+| pos  | bigint                   |
++------+--------------------------+
+
+DESCRIBE array_demo.places_lived;
++------+------------------+
+| name | type             |
++------+------------------+
+| item | struct&lt;          |
+|      |   place:string,  |
+|      |   start_year:int |
+|      | &gt;                |
+| pos  | bigint           |
++------+------------------+
+
+DESCRIBE array_demo.ancestors;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows queries involving <code class="ph 
codeph">ARRAY</code> columns containing elements of scalar or complex types. You
+      <span class="q">"unpack"</span> each <code class="ph 
codeph">ARRAY</code> column by referring to it in a join query, as if it were a 
separate table with
+      <code class="ph codeph">ITEM</code> and <code class="ph 
codeph">POS</code> columns. If the array element is a scalar type, you refer to 
its value using the
+      <code class="ph codeph">ITEM</code> pseudocolumn. If the array element 
is a <code class="ph codeph">STRUCT</code>, you refer to the <code class="ph 
codeph">STRUCT</code> fields
+      using dot notation and the field names. If the array element is another 
<code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, 
you use
+      another level of join to unpack the nested collection elements.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>-- Array of scalar values.
+-- Each array element represents a single string, plus we know its position in 
the array.
+SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets;
+
+-- Array of structs.
+-- Now each array element has named fields, possibly of different types.
+-- You can consider an ARRAY of STRUCT to represent a table inside another 
table.
+SELECT id, name, places_lived.pos, places_lived.item.place, 
places_lived.item.start_year
+FROM array_demo, array_demo.places_lived;
+
+-- The .ITEM name is optional for array elements that are structs.
+-- The following query is equivalent to the previous one, with .ITEM
+-- removed from the column references.
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+  FROM array_demo, array_demo.places_lived;
+
+-- To filter specific items from the array, do comparisons against the .POS or 
.ITEM
+-- pseudocolumns, or names of struct fields, in the WHERE clause.
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+  WHERE pets.pos in (0, 1, 3);
+
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+  WHERE pets.item LIKE 'Mr. %';
+
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+  FROM array_demo, array_demo.places_lived
+WHERE places_lived.place like '%California%';
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_complex_types.html#complex_types">Complex 
Types (Impala 2.3 or higher only)</a>,
+
+      <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type 
(Impala 2.3 or higher only)</a>, <a class="xref" href="impala_map.html#map">MAP 
Complex Type (Impala 2.3 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_datatypes.html">Data 
Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_auditing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_auditing.html 
b/docs/build/html/topics/impala_auditing.html
new file mode 100644
index 0000000..bcd6d9f
--- /dev/null
+++ b/docs/build/html/topics/impala_auditing.html
@@ -0,0 +1,222 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_security.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="auditing"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Auditing Impala Operations</title></head><body 
id="auditing"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Auditing Impala 
Operations</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      To monitor how Impala data is being used within your organization, ensure
+      that your Impala authorization and authentication policies are effective.
+      To detect attempts at intrusion or unauthorized access to Impala
+      data, you can use the auditing feature in Impala 1.2.1 and higher:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Enable auditing by including the option
+        <code class="ph codeph">-audit_event_log_dir=<var class="keyword 
varname">directory_path</var></code>
+        in your <span class="keyword cmdname">impalad</span> startup options.
+        The log directory must be a local directory on the
+        server, not an HDFS directory.
+      </li>
+
+      <li class="li">
+        Decide how many queries will be represented in each log file. By 
default,
+        Impala starts a new log file every 5000 queries. To specify a 
different number,
+        <span class="ph">include
+        the option <code class="ph codeph">-max_audit_event_log_file_size=<var 
class="keyword varname">number_of_queries</var></code>
+        in the <span class="keyword cmdname">impalad</span> startup 
options</span>.
+      </li>
+
+      <li class="li"> 
+        Use a cluster manager with governance capabilities to filter, 
visualize,
+        and produce reports based on the audit logs collected
+        from all the hosts in the cluster. 
+      </li>
+    </ul>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_security.html">Impala 
Security</a></div></div></nav><article class="topic concept nested1" 
aria-labelledby="ariaid-title2" id="auditing__auditing_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Durability and 
Performance Considerations for Impala Auditing</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The auditing feature only imposes performance overhead while auditing 
is enabled.
+      </p>
+
+      <p class="p">
+        Because any Impala host can process a query, enable auditing on all 
hosts where the
+        <span class="ph"><span class="keyword cmdname">impalad</span> 
daemon</span>
+         runs. Each host stores its own log
+        files, in a directory in the local filesystem. The log data is 
periodically flushed to disk (through an
+        <code class="ph codeph">fsync()</code> system call) to avoid loss of 
audit data in case of a crash.
+      </p>
+
+      <p class="p"> 
+        The runtime overhead of auditing applies to whichever host serves as 
the coordinator
+        for the query, that is, the host you connect to when you issue the 
query. This might
+        be the same host for all queries, or different applications or users 
might connect to
+        and issue queries through different hosts. 
+      </p>
+
+      <p class="p"> 
+        To avoid excessive I/O overhead on busy coordinator hosts, Impala 
syncs the audit log
+        data (using the <code class="ph codeph">fsync()</code> system call) 
periodically rather than after
+        every query. Currently, the <code class="ph codeph">fsync()</code> 
calls are issued at a fixed
+        interval, every 5 seconds. 
+      </p>
+
+      <p class="p">
+        By default, Impala avoids losing any audit log data in the case of an 
error during a logging operation
+        (such as a disk full error), by immediately shutting down
+        <span class="keyword cmdname">impalad</span> on the host where the 
auditing problem occurred.
+        <span class="ph">You can override this setting by specifying the option
+        <code class="ph codeph">-abort_on_failed_audit_event=false</code> in 
the <span class="keyword cmdname">impalad</span> startup options.</span>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="auditing__auditing_format">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Format of the Audit Log 
Files</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p"> 
+        The audit log files represent the query information in JSON format, 
one query per line.
+        Typically, rather than looking at the log files themselves, you should 
use cluster-management
+        software to consolidate the log data from all Impala hosts and filter 
and visualize the results
+        in useful ways. (If you do examine the raw log data, you might run the 
files through
+        a JSON pretty-printer first.) 
+     </p>
+
+      <p class="p">
+        All the information about schema objects accessed by the query is 
encoded in a single nested record on the
+        same line. For example, the audit log for an <code class="ph 
codeph">INSERT ... SELECT</code> statement records that a
+        select operation occurs on the source table and an insert operation 
occurs on the destination table. The
+        audit log for a query against a view records the base table accessed 
by the view, or multiple base tables
+        in the case of a view that includes a join query. Every Impala 
operation that corresponds to a SQL
+        statement is recorded in the audit logs, whether the operation 
succeeds or fails. Impala records more
+        information for a successful operation than for a failed one, because 
an unauthorized query is stopped
+        immediately, before all the query planning is completed.
+      </p>
+
+
+
+      <p class="p">
+        The information logged for each query includes:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Client session state:
+          <ul class="ul">
+            <li class="li">
+              Session ID
+            </li>
+
+            <li class="li">
+              User name
+            </li>
+
+            <li class="li">
+              Network address of the client connection
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          SQL statement details:
+          <ul class="ul">
+            <li class="li">
+              Query ID
+            </li>
+
+            <li class="li">
+              Statement Type - DML, DDL, and so on
+            </li>
+
+            <li class="li">
+              SQL statement text
+            </li>
+
+            <li class="li">
+              Execution start time, in local time
+            </li>
+
+            <li class="li">
+              Execution Status - Details on any errors that were encountered
+            </li>
+
+            <li class="li">
+              Target Catalog Objects:
+              <ul class="ul">
+                <li class="li">
+                  Object Type - Table, View, or Database
+                </li>
+
+                <li class="li">
+                  Fully qualified object name
+                </li>
+
+                <li class="li">
+                  Privilege - How the object is being used (<code class="ph 
codeph">SELECT</code>, <code class="ph codeph">INSERT</code>,
+                  <code class="ph codeph">CREATE</code>, and so on)
+                </li>
+              </ul>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" 
id="auditing__auditing_exceptions">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Which Operations Are 
Audited</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The kinds of SQL queries represented in the audit log are:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries that are prevented due to lack of authorization.
+        </li>
+
+        <li class="li">
+          Queries that Impala can analyze and parse to determine that they are 
authorized. The audit data is
+          recorded immediately after Impala finishes its analysis, before the 
query is actually executed.
+        </li>
+      </ul>
+
+      <p class="p">
+        The audit log does not contain entries for queries that could not be 
parsed and analyzed. For example, a
+        query that fails due to a syntax error is not recorded in the audit 
log. The audit log also does not
+        contain queries that fail due to a reference to a table that does not 
exist, if you would be authorized to
+        access the table if it did exist.
+      </p>
+
+      <p class="p">
+        Certain statements in the <span class="keyword 
cmdname">impala-shell</span> interpreter, such as <code class="ph 
codeph">CONNECT</code>,
+        <code class="ph codeph">SUMMARY</code>, <code class="ph 
codeph">PROFILE</code>, <code class="ph codeph">SET</code>, and
+        <code class="ph codeph">QUIT</code>, do not correspond to actual SQL 
queries, and these statements are not reflected in
+        the audit log.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_authentication.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_authentication.html 
b/docs/build/html/topics/impala_authentication.html
new file mode 100644
index 0000000..504f6c7
--- /dev/null
+++ b/docs/build/html/topics/impala_authentication.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_security.html"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_kerberos.html"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_ldap.html"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_mixed_security.html"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_delegation.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="authentication"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Impala Auth
 entication</title></head><body id="authentication"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Authentication</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Authentication is the mechanism to ensure that only specified hosts and 
users can connect to Impala. It also
+      verifies that when clients connect to Impala, they are connected to a 
legitimate server. This feature
+      prevents spoofing such as <dfn class="term">impersonation</dfn> (setting 
up a phony client system with the same account
+      and group names as a legitimate user) and <dfn 
class="term">man-in-the-middle attacks</dfn> (intercepting application
+      requests before they reach Impala and eavesdropping on sensitive 
information in the requests or the results).
+    </p>
+
+    <p class="p">
+      Impala supports authentication using either Kerberos or LDAP.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      Regardless of the authentication mechanism used, Impala always creates 
HDFS directories and data files
+      owned by the same user (typically <code class="ph 
codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry 
authorization feature, as explained in
+      <a class="xref" 
href="../shared/../topics/impala_authorization.html#authorization">Enabling 
Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      Once you are finished setting up authentication, move on to 
authorization, which involves specifying what
+      databases, tables, HDFS directories, and so on can be accessed by 
particular users when they connect through
+      Impala. See <a class="xref" 
href="impala_authorization.html#authorization">Enabling Sentry Authorization 
for Impala</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li 
class="link ulchildlink"><strong><a 
href="../topics/impala_kerberos.html">Enabling Kerberos Authentication for 
Impala</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_ldap.html">Enabling LDAP Authentication for 
Impala</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_mixed_security.html">Using Multiple Authentication 
Methods with Impala</a></strong><br></li><li class="link 
ulchildlink"><strong><a href="../topics/impala_delegation.html">Configuring 
Impala Delegation for Hue and BI Tools</a></strong><br></li></ul><div 
class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a 
class="link" href="../topics/impala_security.html">Impala 
Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[47/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Reply via email to