http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_appx_median.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_appx_median.html b/docs/build/html/topics/impala_appx_median.html new file mode 100644 index 0000000..1883f2c --- /dev/null +++ b/docs/build/html/topics/impala_appx_median.html @@ -0,0 +1,127 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_median"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_MEDIAN Function</title></head><body id="appx_median"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">APPX_MEDIAN Function</h1> + + + + <div class="body conbody"> + + <p class="p"> + + An aggregate function that returns a value that is approximately the median (midpoint) of values in the set + of input values. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>APPX_MEDIAN([DISTINCT | ALL] <var class="keyword varname">expression</var>) +</code></pre> + + <p class="p"> + This function works with any input type, because the only requirement is that the type supports less-than and + greater-than comparison operators. + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Because the return value represents the estimated midpoint, it might not reflect the precise midpoint value, + especially if the cardinality of the input values is very high. If the cardinality is low (up to + approximately 20,000), the result is more accurate because the sampling considers all or almost all of the + different values. + </p> + + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> + arguments which produce a <code class="ph codeph">STRING</code> result + </p> + + <p class="p"> + The return value is always the same as one of the input values, not an <span class="q">"in-between"</span> value produced by + averaging. + </p> + + + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <p class="p"> + This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following example uses a table of a million random floating-point numbers ranging up to approximately + 50,000. The average is approximately 25,000. Because of the random distribution, we would expect the median + to be close to this same number. Computing the precise median is a more intensive operation than computing + the average, because it requires keeping track of every distinct value and how many times each occurs. The + <code class="ph codeph">APPX_MEDIAN()</code> function uses a sampling algorithm to return an approximate result, which in + this case is close to the expected value. To make sure that the value is not substantially out of range due + to a skewed distribution, subsequent queries confirm that there are approximately 500,000 values higher than + the <code class="ph codeph">APPX_MEDIAN()</code> value, and approximately 500,000 values lower than the + <code class="ph codeph">APPX_MEDIAN()</code> value. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select min(x), max(x), avg(x) from million_numbers; ++-------------------+-------------------+-------------------+ +| min(x) | max(x) | avg(x) | ++-------------------+-------------------+-------------------+ +| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 | ++-------------------+-------------------+-------------------+ +[localhost:21000] > select appx_median(x) from million_numbers; ++----------------+ +| appx_median(x) | ++----------------+ +| 24721.6 | ++----------------+ +[localhost:21000] > select count(x) as higher from million_numbers where x > (select appx_median(x) from million_numbers); ++--------+ +| higher | ++--------+ +| 502013 | ++--------+ +[localhost:21000] > select count(x) as lower from million_numbers where x < (select appx_median(x) from million_numbers); ++--------+ +| lower | ++--------+ +| 497987 | ++--------+ +</code></pre> + + <p class="p"> + The following example computes the approximate median using a subset of the values from the table, and then + confirms that the result is a reasonable estimate for the midpoint. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select appx_median(x) from million_numbers where x between 1000 and 5000; ++-------------------+ +| appx_median(x) | ++-------------------+ +| 3013.107787358159 | ++-------------------+ +[localhost:21000] > select count(x) as higher from million_numbers where x between 1000 and 5000 and x > 3013.107787358159; ++--------+ +| higher | ++--------+ +| 37692 | ++--------+ +[localhost:21000] > select count(x) as lower from million_numbers where x between 1000 and 5000 and x < 3013.107787358159; ++-------+ +| lower | ++-------+ +| 37089 | ++-------+ +</code></pre> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_array.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_array.html b/docs/build/html/topics/impala_array.html new file mode 100644 index 0000000..45c9a42 --- /dev/null +++ b/docs/build/html/topics/impala_array.html @@ -0,0 +1,321 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="array"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ARRAY Complex Type (Impala 2.3 or higher only)</title></head><body id="array"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">ARRAY Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + A complex data type that can represent an arbitrary number of ordered elements. + The elements can be scalars or another complex type (<code class="ph codeph">ARRAY</code>, + <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>). + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> ARRAY < <var class="keyword varname">type</var> > + +type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var> +</code></pre> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Because complex types are often used in combination, + for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> + elements, if you are unfamiliar with the Impala complex types, + start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for + background information and usage examples. + </p> + + <p class="p"> + The elements of the array have no names. You refer to the value of the array item using the + <code class="ph codeph">ITEM</code> pseudocolumn, or its position in the array with the <code class="ph codeph">POS</code> + pseudocolumn. See <a class="xref" href="impala_complex_types.html#item">ITEM and POS Pseudocolumns</a> for information about + these pseudocolumns. + </p> + + + + <p class="p"> + Each row can have a different number of elements (including none) in the array for that row. + </p> + + + + <p class="p"> + When an array contains items of scalar types, you can use aggregation functions on the array elements without using join notation. For + example, you can find the <code class="ph codeph">COUNT()</code>, <code class="ph codeph">AVG()</code>, <code class="ph codeph">SUM()</code>, and so on of numeric array + elements, or the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> of any scalar array elements by referring to + <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">array_column</var></code> in the <code class="ph codeph">FROM</code> clause of the query. When + you need to cross-reference values from the array with scalar values from the same row, such as by including a <code class="ph codeph">GROUP + BY</code> clause to produce a separate aggregated result for each row, then the join clause is required. + </p> + + <p class="p"> + A common usage pattern with complex types is to have an array as the top-level type for the column: + an array of structs, an array of maps, or an array of arrays. + For example, you can model a denormalized table by creating a column that is an <code class="ph codeph">ARRAY</code> + of <code class="ph codeph">STRUCT</code> elements; each item in the array represents a row from a table that would + normally be used in a join query. This kind of data structure lets you essentially denormalize tables by + associating multiple rows from one table with the matching row in another table. + </p> + + <p class="p"> + You typically do not create more than one top-level <code class="ph codeph">ARRAY</code> column, because if there is + some relationship between the elements of multiple arrays, it is convenient to model the data as + an array of another complex type element (either <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>). + </p> + + <p class="p"> + You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code> + to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code> + column and visualize its structure as if it were a table. + For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column + <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>. + If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>, + and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>, + you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>. + An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with + <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. + A <code class="ph codeph">STRUCT</code> is shown as a table with each field + representing a column in the table. + A <code class="ph codeph">MAP</code> is shown as a two-column table, with + <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Columns with this data type can only be used in tables or partitions with the Parquet file format. + </p> + </li> + <li class="li"> + <p class="p"> + Columns with this data type cannot be used as partition key columns in a partitioned table. + </p> + </li> + <li class="li"> + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type. + </p> + </li> + <li class="li"> + <p class="p" id="array__d6e2889"> + The maximum length of the column definition for any complex type, including declarations for any nested types, + is 4000 characters. + </p> + </li> + <li class="li"> + <p class="p"> + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations + and associated guidelines about complex type columns. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + <p class="p"> + Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, + <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Many of the complex type examples refer to tables + such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code> + adapted from the tables used in the TPC-H benchmark. + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a> + for the table definitions. + </div> + + <p class="p"> + The following example shows how to construct a table with various kinds of <code class="ph codeph">ARRAY</code> columns, + both at the top level and nested within other complex types. + Whenever the <code class="ph codeph">ARRAY</code> consists of a scalar value, such as in the <code class="ph codeph">PETS</code> + column or the <code class="ph codeph">CHILDREN</code> field, you can see that future expansion is limited. + For example, you could not easily evolve the schema to record the kind of pet or the child's birthday alongside the name. + Therefore, it is more common to use an <code class="ph codeph">ARRAY</code> whose elements are of <code class="ph codeph">STRUCT</code> type, + to associate multiple fields with each array element. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns + using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably. + </div> + + + +<pre class="pre codeblock"><code>CREATE TABLE array_demo +( + id BIGINT, + name STRING, +-- An ARRAY of scalar type as a top-level column. + pets ARRAY <STRING>, + +-- An ARRAY with elements of complex type (STRUCT). + places_lived ARRAY < STRUCT < + place: STRING, + start_year: INT + >>, + +-- An ARRAY as a field (CHILDREN) within a STRUCT. +-- (The STRUCT is inside another ARRAY, because it is rare +-- for a STRUCT to be a top-level column.) + marriages ARRAY < STRUCT < + spouse: STRING, + children: ARRAY <STRING> + >>, + +-- An ARRAY as the value part of a MAP. +-- The first MAP field (the key) would be a value such as +-- 'Parent' or 'Grandparent', and the corresponding array would +-- represent 2 parents, 4 grandparents, and so on. + ancestors MAP < STRING, ARRAY <STRING> > +) +STORED AS PARQUET; + +</code></pre> + + <p class="p"> + The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">ARRAY</code> columns by using the + <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">ARRAY</code> as its own two-column table, with columns + <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. + </p> + + + +<pre class="pre codeblock"><code>DESCRIBE array_demo; ++--------------+---------------------------+ +| name | type | ++--------------+---------------------------+ +| id | bigint | +| name | string | +| pets | array<string> | +| marriages | array<struct< | +| | spouse:string, | +| | children:array<string> | +| | >> | +| places_lived | array<struct< | +| | place:string, | +| | start_year:int | +| | >> | +| ancestors | map<string,array<string>> | ++--------------+---------------------------+ + +DESCRIBE array_demo.pets; ++------+--------+ +| name | type | ++------+--------+ +| item | string | +| pos | bigint | ++------+--------+ + +DESCRIBE array_demo.marriages; ++------+--------------------------+ +| name | type | ++------+--------------------------+ +| item | struct< | +| | spouse:string, | +| | children:array<string> | +| | > | +| pos | bigint | ++------+--------------------------+ + +DESCRIBE array_demo.places_lived; ++------+------------------+ +| name | type | ++------+------------------+ +| item | struct< | +| | place:string, | +| | start_year:int | +| | > | +| pos | bigint | ++------+------------------+ + +DESCRIBE array_demo.ancestors; ++-------+---------------+ +| name | type | ++-------+---------------+ +| key | string | +| value | array<string> | ++-------+---------------+ + +</code></pre> + + <p class="p"> + The following example shows queries involving <code class="ph codeph">ARRAY</code> columns containing elements of scalar or complex types. You + <span class="q">"unpack"</span> each <code class="ph codeph">ARRAY</code> column by referring to it in a join query, as if it were a separate table with + <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. If the array element is a scalar type, you refer to its value using the + <code class="ph codeph">ITEM</code> pseudocolumn. If the array element is a <code class="ph codeph">STRUCT</code>, you refer to the <code class="ph codeph">STRUCT</code> fields + using dot notation and the field names. If the array element is another <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, you use + another level of join to unpack the nested collection elements. + </p> + + + +<pre class="pre codeblock"><code>-- Array of scalar values. +-- Each array element represents a single string, plus we know its position in the array. +SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets; + +-- Array of structs. +-- Now each array element has named fields, possibly of different types. +-- You can consider an ARRAY of STRUCT to represent a table inside another table. +SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year +FROM array_demo, array_demo.places_lived; + +-- The .ITEM name is optional for array elements that are structs. +-- The following query is equivalent to the previous one, with .ITEM +-- removed from the column references. +SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year + FROM array_demo, array_demo.places_lived; + +-- To filter specific items from the array, do comparisons against the .POS or .ITEM +-- pseudocolumns, or names of struct fields, in the WHERE clause. +SELECT id, name, pets.item FROM array_demo, array_demo.pets + WHERE pets.pos in (0, 1, 3); + +SELECT id, name, pets.item FROM array_demo, array_demo.pets + WHERE pets.item LIKE 'Mr. %'; + +SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year + FROM array_demo, array_demo.places_lived +WHERE places_lived.place like '%California%'; + +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>, + + <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a> + </p> + + </div> + +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_auditing.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_auditing.html b/docs/build/html/topics/impala_auditing.html new file mode 100644 index 0000000..bcd6d9f --- /dev/null +++ b/docs/build/html/topics/impala_auditing.html @@ -0,0 +1,222 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="auditing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Auditing Impala Operations</title></head><body id="auditing"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Auditing Impala Operations</h1> + + + + <div class="body conbody"> + + <p class="p"> + To monitor how Impala data is being used within your organization, ensure + that your Impala authorization and authentication policies are effective. + To detect attempts at intrusion or unauthorized access to Impala + data, you can use the auditing feature in Impala 1.2.1 and higher: + </p> + + <ul class="ul"> + <li class="li"> + Enable auditing by including the option + <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code> + in your <span class="keyword cmdname">impalad</span> startup options. + The log directory must be a local directory on the + server, not an HDFS directory. + </li> + + <li class="li"> + Decide how many queries will be represented in each log file. By default, + Impala starts a new log file every 5000 queries. To specify a different number, + <span class="ph">include + the option <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code> + in the <span class="keyword cmdname">impalad</span> startup options</span>. + </li> + + <li class="li"> + Use a cluster manager with governance capabilities to filter, visualize, + and produce reports based on the audit logs collected + from all the hosts in the cluster. + </li> + </ul> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="auditing__auditing_performance"> + + <h2 class="title topictitle2" id="ariaid-title2">Durability and Performance Considerations for Impala Auditing</h2> + + + <div class="body conbody"> + + <p class="p"> + The auditing feature only imposes performance overhead while auditing is enabled. + </p> + + <p class="p"> + Because any Impala host can process a query, enable auditing on all hosts where the + <span class="ph"><span class="keyword cmdname">impalad</span> daemon</span> + runs. Each host stores its own log + files, in a directory in the local filesystem. The log data is periodically flushed to disk (through an + <code class="ph codeph">fsync()</code> system call) to avoid loss of audit data in case of a crash. + </p> + + <p class="p"> + The runtime overhead of auditing applies to whichever host serves as the coordinator + for the query, that is, the host you connect to when you issue the query. This might + be the same host for all queries, or different applications or users might connect to + and issue queries through different hosts. + </p> + + <p class="p"> + To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log + data (using the <code class="ph codeph">fsync()</code> system call) periodically rather than after + every query. Currently, the <code class="ph codeph">fsync()</code> calls are issued at a fixed + interval, every 5 seconds. + </p> + + <p class="p"> + By default, Impala avoids losing any audit log data in the case of an error during a logging operation + (such as a disk full error), by immediately shutting down + <span class="keyword cmdname">impalad</span> on the host where the auditing problem occurred. + <span class="ph">You can override this setting by specifying the option + <code class="ph codeph">-abort_on_failed_audit_event=false</code> in the <span class="keyword cmdname">impalad</span> startup options.</span> + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="auditing__auditing_format"> + + <h2 class="title topictitle2" id="ariaid-title3">Format of the Audit Log Files</h2> + + + <div class="body conbody"> + + <p class="p"> + The audit log files represent the query information in JSON format, one query per line. + Typically, rather than looking at the log files themselves, you should use cluster-management + software to consolidate the log data from all Impala hosts and filter and visualize the results + in useful ways. (If you do examine the raw log data, you might run the files through + a JSON pretty-printer first.) + </p> + + <p class="p"> + All the information about schema objects accessed by the query is encoded in a single nested record on the + same line. For example, the audit log for an <code class="ph codeph">INSERT ... SELECT</code> statement records that a + select operation occurs on the source table and an insert operation occurs on the destination table. The + audit log for a query against a view records the base table accessed by the view, or multiple base tables + in the case of a view that includes a join query. Every Impala operation that corresponds to a SQL + statement is recorded in the audit logs, whether the operation succeeds or fails. Impala records more + information for a successful operation than for a failed one, because an unauthorized query is stopped + immediately, before all the query planning is completed. + </p> + + + + <p class="p"> + The information logged for each query includes: + </p> + + <ul class="ul"> + <li class="li"> + Client session state: + <ul class="ul"> + <li class="li"> + Session ID + </li> + + <li class="li"> + User name + </li> + + <li class="li"> + Network address of the client connection + </li> + </ul> + </li> + + <li class="li"> + SQL statement details: + <ul class="ul"> + <li class="li"> + Query ID + </li> + + <li class="li"> + Statement Type - DML, DDL, and so on + </li> + + <li class="li"> + SQL statement text + </li> + + <li class="li"> + Execution start time, in local time + </li> + + <li class="li"> + Execution Status - Details on any errors that were encountered + </li> + + <li class="li"> + Target Catalog Objects: + <ul class="ul"> + <li class="li"> + Object Type - Table, View, or Database + </li> + + <li class="li"> + Fully qualified object name + </li> + + <li class="li"> + Privilege - How the object is being used (<code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>, + <code class="ph codeph">CREATE</code>, and so on) + </li> + </ul> + </li> + </ul> + </li> + </ul> + + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="auditing__auditing_exceptions"> + + <h2 class="title topictitle2" id="ariaid-title4">Which Operations Are Audited</h2> + + <div class="body conbody"> + + <p class="p"> + The kinds of SQL queries represented in the audit log are: + </p> + + <ul class="ul"> + <li class="li"> + Queries that are prevented due to lack of authorization. + </li> + + <li class="li"> + Queries that Impala can analyze and parse to determine that they are authorized. The audit data is + recorded immediately after Impala finishes its analysis, before the query is actually executed. + </li> + </ul> + + <p class="p"> + The audit log does not contain entries for queries that could not be parsed and analyzed. For example, a + query that fails due to a syntax error is not recorded in the audit log. The audit log also does not + contain queries that fail due to a reference to a table that does not exist, if you would be authorized to + access the table if it did exist. + </p> + + <p class="p"> + Certain statements in the <span class="keyword cmdname">impala-shell</span> interpreter, such as <code class="ph codeph">CONNECT</code>, + <code class="ph codeph">SUMMARY</code>, <code class="ph codeph">PROFILE</code>, <code class="ph codeph">SET</code>, and + <code class="ph codeph">QUIT</code>, do not correspond to actual SQL queries, and these statements are not reflected in + the audit log. + </p> + </div> + </article> +</article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_authentication.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_authentication.html b/docs/build/html/topics/impala_authentication.html new file mode 100644 index 0000000..504f6c7 --- /dev/null +++ b/docs/build/html/topics/impala_authentication.html @@ -0,0 +1,37 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_kerberos.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ldap.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mixed_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delegation.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authentication"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Auth entication</title></head><body id="authentication"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Authentication</h1> + + + <div class="body conbody"> + + <p class="p"> + Authentication is the mechanism to ensure that only specified hosts and users can connect to Impala. It also + verifies that when clients connect to Impala, they are connected to a legitimate server. This feature + prevents spoofing such as <dfn class="term">impersonation</dfn> (setting up a phony client system with the same account + and group names as a legitimate user) and <dfn class="term">man-in-the-middle attacks</dfn> (intercepting application + requests before they reach Impala and eavesdropping on sensitive information in the requests or the results). + </p> + + <p class="p"> + Impala supports authentication using either Kerberos or LDAP. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files + owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different + databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in + <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>. + </div> + + <p class="p toc"></p> + + <p class="p"> + Once you are finished setting up authentication, move on to authorization, which involves specifying what + databases, tables, HDFS directories, and so on can be accessed by particular users when they connect through + Impala. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details. + </p> + </div> +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html> \ No newline at end of file
