http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_components.html ---------------------------------------------------------------------- diff --git a/docs/build3x/html/topics/impala_components.html b/docs/build3x/html/topics/impala_components.html new file mode 100644 index 0000000..eb6e0f6 --- /dev/null +++ b/docs/build3x/html/topics/impala_components.html @@ -0,0 +1,227 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_components"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Components of the Impala Server</title></head><body id="intro_components"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Components of the Impala Server</h1> + + + + <div class="body conbody"> + + <p class="p"> + The Impala server is a distributed, massively parallel processing (MPP) database engine. It consists of + different daemon processes that run on specific hosts within your <span class="keyword"></span> cluster. + </p> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_components__intro_impalad"> + + <h2 class="title topictitle2" id="ariaid-title2">The Impala Daemon</h2> + + <div class="body conbody"> + + <p class="p"> + The core Impala component is a daemon process that runs on each DataNode of the cluster, physically represented + by the <code class="ph codeph">impalad</code> process. It reads and writes to data files; accepts queries transmitted + from the <code class="ph codeph">impala-shell</code> command, Hue, JDBC, or ODBC; parallelizes the queries and + distributes work across the cluster; and transmits intermediate query results back to the + central coordinator node. + </p> + + <p class="p"> + You can submit a query to the Impala daemon running on any DataNode, and that instance of the daemon serves as the + <dfn class="term">coordinator node</dfn> for that query. The other nodes transmit partial results back to the + coordinator, which constructs the final result set for a query. When running experiments with functionality + through the <code class="ph codeph">impala-shell</code> command, you might always connect to the same Impala daemon for + convenience. For clusters running production workloads, you might load-balance by + submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces. + </p> + + <p class="p"> + The Impala daemons are in constant communication with the <dfn class="term">statestore</dfn>, to confirm which nodes + are healthy and can accept new work. + </p> + + <p class="p"> + They also receive broadcast messages from the <span class="keyword cmdname">catalogd</span> daemon (introduced in Impala 1.2) + whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an + <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statement is processed through Impala. This + background communication minimizes the need for <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE + METADATA</code> statements that were needed to coordinate metadata across nodes prior to Impala 1.2. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.9</span> and higher, you can control which hosts act as query coordinators + and which act as query executors, to improve scalability for highly concurrent workloads on large clusters. + See <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for details. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, + <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_timeouts.html#impalad_timeout">Setting the Idle Query and Idle Session Timeouts for impalad</a>, + <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>, <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a> + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_components__intro_statestore"> + + <h2 class="title topictitle2" id="ariaid-title3">The Impala Statestore</h2> + + <div class="body conbody"> + + <p class="p"> + The Impala component known as the <dfn class="term">statestore</dfn> checks on the health of Impala daemons on all the + DataNodes in a cluster, and continuously relays its findings to each of those daemons. It is physically + represented by a daemon process named <code class="ph codeph">statestored</code>; you only need such a process on one + host in the cluster. If an Impala daemon goes offline due to hardware failure, network error, software issue, + or other reason, the statestore informs all the other Impala daemons so that future queries can avoid making + requests to the unreachable node. + </p> + + <p class="p"> + Because the statestore's purpose is to help when things go wrong, it is not critical to the normal + operation of an Impala cluster. If the statestore is not running or becomes unreachable, the Impala daemons + continue running and distributing work among themselves as usual; the cluster just becomes less robust if + other Impala daemons fail while the statestore is offline. When the statestore comes back online, it re-establishes + communication with the Impala daemons and resumes its monitoring function. + </p> + + <p class="p"> + Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon. + The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special + requirements for high availability, because problems with those daemons do not result in data loss. + If those daemons become unavailable due to an outage on a particular + host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and + <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the + Impala service. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a>, + <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, + <a class="xref" href="impala_timeouts.html#statestore_timeout">Increasing the Statestore Timeout</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_components__intro_catalogd"> + + <h2 class="title topictitle2" id="ariaid-title4">The Impala Catalog Service</h2> + + <div class="body conbody"> + + <p class="p"> + The Impala component known as the <dfn class="term">catalog service</dfn> relays the metadata changes from Impala SQL + statements to all the Impala daemons in a cluster. It is physically represented by a daemon process named + <code class="ph codeph">catalogd</code>; you only need such a process on one host in the cluster. Because the requests + are passed through the statestore daemon, it makes sense to run the <span class="keyword cmdname">statestored</span> and + <span class="keyword cmdname">catalogd</span> services on the same host. + </p> + + <p class="p"> + The catalog service avoids the need to issue + <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements when the metadata changes are + performed by statements issued through Impala. When you create a table, load data, and so on through Hive, + you do need to issue <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> on an Impala node + before executing a query there. + </p> + + <p class="p"> + This feature touches a number of aspects of Impala: + </p> + + + + <ul class="ul" id="intro_catalogd__catalogd_xrefs"> + <li class="li"> + <p class="p"> + See <a class="xref" href="impala_install.html#install">Installing Impala</a>, <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a> and + <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, for usage information for the + <span class="keyword cmdname">catalogd</span> daemon. + </p> + </li> + + <li class="li"> + <p class="p"> + The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are not needed + when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or + data-changing operation is performed through Impala. These statements are still needed if such + operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the + statements only need to be issued on one Impala node rather than on all nodes. See + <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and + <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage information for + those statements. + </p> + </li> + </ul> + + <div class="p"> + Use <code class="ph codeph">--load_catalog_in_background</code> option to control when + the metadata of a table is loaded. + <ul class="ul"> + <li class="li"> + If set to <code class="ph codeph">false</code>, the metadata of a table is + loaded when it is referenced for the first time. This means that the + first run of a particular query can be slower than subsequent runs. + Starting in Impala 2.2, the default for + <code class="ph codeph">load_catalog_in_background</code> is + <code class="ph codeph">false</code>. + </li> + <li class="li"> + If set to <code class="ph codeph">true</code>, the catalog service attempts to + load metadata for a table even if no query needed that metadata. So + metadata will possibly be already loaded when the first query that + would need it is run. However, for the following reasons, we + recommend not to set the option to <code class="ph codeph">true</code>. + <ul class="ul"> + <li class="li"> + Background load can interfere with query-specific metadata + loading. This can happen on startup or after invalidating + metadata, with a duration depending on the amount of metadata, + and can lead to a seemingly random long running queries that are + difficult to diagnose. + </li> + <li class="li"> + Impala may load metadata for tables that are possibly never + used, potentially increasing catalog size and consequently memory + usage for both catalog service and Impala Daemon. + </li> + </ul> + </li> + </ul> + </div> + + <p class="p"> + Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon. + The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special + requirements for high availability, because problems with those daemons do not result in data loss. + If those daemons become unavailable due to an outage on a particular + host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and + <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the + Impala service. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after + the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full + reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast + mechanism faster and more responsive, especially during Impala startup. See + <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details. + </p> + </div> + + <p class="p"> + <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, + <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> + </p> + </div> + </article> +</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compression_codec.html ---------------------------------------------------------------------- diff --git a/docs/build3x/html/topics/impala_compression_codec.html b/docs/build3x/html/topics/impala_compression_codec.html new file mode 100644 index 0000000..5933efa --- /dev/null +++ b/docs/build3x/html/topics/impala_compression_codec.html @@ -0,0 +1,92 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</title></head><body id="compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">COMPRESSION_CODEC Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1> + + + + <div class="body conbody"> + + + + + + <p class="p"> + + When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying compression + is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Prior to Impala 2.0, this option was named <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>. In Impala 2.0 and + later, the <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> name is not recognized. Use the more general name + <code class="ph codeph">COMPRESSION_CODEC</code> for new code. + </div> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>SET COMPRESSION_CODEC=<var class="keyword varname">codec_name</var>;</code></pre> + + <p class="p"> + The allowed values for this query option are <code class="ph codeph">SNAPPY</code> (the default), <code class="ph codeph">GZIP</code>, + and <code class="ph codeph">NONE</code>. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + A Parquet file created with <code class="ph codeph">COMPRESSION_CODEC=NONE</code> is still typically smaller than the + original data, due to encoding schemes such as run-length encoding and dictionary encoding that are applied + separately from compression. + </div> + + <p class="p"></p> + + <p class="p"> + The option value is not case-sensitive. + </p> + + <p class="p"> + If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option + setting, not just queries involving Parquet tables. (The value <code class="ph codeph">BZIP2</code> is also recognized, but + is not compatible with Parquet tables.) + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> + </p> + + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">SNAPPY</code> + </p> + + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>set compression_codec=gzip; +insert into parquet_table_highly_compressed select * from t1; + +set compression_codec=snappy; +insert into parquet_table_compression_plus_fast_queries select * from t1; + +set compression_codec=none; +insert into parquet_table_no_compression select * from t1; + +set compression_codec=foo; +select * from t1 limit 5; +ERROR: Invalid compression codec: foo +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + For information about how compressing Parquet data files affects query performance, see + <a class="xref" href="impala_parquet.html#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a>. + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats.html ---------------------------------------------------------------------- diff --git a/docs/build3x/html/topics/impala_compute_stats.html b/docs/build3x/html/topics/impala_compute_stats.html new file mode 100644 index 0000000..407ba97 --- /dev/null +++ b/docs/build3x/html/topics/impala_compute_stats.html @@ -0,0 +1,637 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE STATS Statement</title></head><body id="compute_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">COMPUTE STATS Statement</h1> + + + + <div class="body conbody"> + + <p class="p"> + The + COMPUTE STATS statement gathers information about volume and distribution + of data in a table and all associated columns and partitions. The + information is stored in the metastore database, and used by Impala to + help optimize queries. For example, if Impala can determine that a table + is large or small, or has many or few distinct values it can organize and + parallelize the work appropriately for a join query or insert operation. + For details about the kinds of information gathered by this statement, see + <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code><span class="ph">COMPUTE STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [ ( <var class="keyword varname">column_list</var> ) ] [TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)]]</span> + +<var class="keyword varname">column_list</var> ::= <var class="keyword varname">column_name</var> [ , <var class="keyword varname">column_name</var>, ... ] + +COMPUTE INCREMENTAL STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] + +<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span> + +<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var> + +<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span> +</code></pre> + + <p class="p"> + The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code> + clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP + INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION + (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or + <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the + specification, and specify constant values for all the partition key columns. + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Originally, Impala relied on users to run the Hive <code class="ph codeph">ANALYZE + TABLE</code> statement, but that method of gathering statistics proved + unreliable and difficult to use. The Impala <code class="ph codeph">COMPUTE STATS</code> + statement was built to improve the reliability and user-friendliness of + this operation. <code class="ph codeph">COMPUTE STATS</code> does not require any setup + steps or special configuration. You only run a single Impala + <code class="ph codeph">COMPUTE STATS</code> statement to gather both table and column + statistics, rather than separate Hive <code class="ph codeph">ANALYZE TABLE</code> + statements for each kind of statistics. + </p> + + <p class="p"> + For non-incremental <code class="ph codeph">COMPUTE STATS</code> + statement, the columns for which statistics are computed can be specified + with an optional comma-separate list of columns. + </p> + + <p class="p"> + If no column list is given, the <code class="ph codeph">COMPUTE STATS</code> statement + computes column-level statistics for all columns of the table. This adds + potentially unneeded work for columns whose stats are not needed by + queries. It can be especially costly for very wide tables and unneeded + large string fields. + </p> + <p class="p"> + <code class="ph codeph">COMPUTE STATS</code> returns an error when a specified column + cannot be analyzed, such as when the column does not exist, the column is + of an unsupported type for COMPUTE STATS, e.g. colums of complex types, + or the column is a partitioning column. + + </p> + <p class="p"> + If an empty column list is given, no column is analyzed by <code class="ph codeph">COMPUTE + STATS</code>. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.12</span> and + higher, an optional <code class="ph codeph">TABLESAMPLE</code> clause immediately after + a table reference specifies that the <code class="ph codeph">COMPUTE STATS</code> + operation only processes a specified percentage of the table data. For + tables that are so large that a full <code class="ph codeph">COMPUTE STATS</code> + operation is impractical, you can use <code class="ph codeph">COMPUTE STATS</code> with + a <code class="ph codeph">TABLESAMPLE</code> clause to extrapolate statistics from a + sample of the table data. See <a href="impala_perf_stats.html"><span class="keyword">Table and Column Statistics</span></a>about the + experimental stats extrapolation and sampling features. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation is a shortcut for partitioned tables that works on a + subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables + with many partitions, where a full <code class="ph codeph">COMPUTE STATS</code> operation takes too long to be practical + each time a partition is added or dropped. See <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">impala_perf_stats.html#perf_stats_incremental</a> + for full usage details. + </p> + + <div class="note important note_important"><span class="note__title importanttitle">Important:</span> + <p class="p"> + For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or + <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or + alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to + <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or + vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before + making the switch. + </p> + <p class="p"> + When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time, + the statistics are computed again from scratch regardless of whether the table already + has statistics. Therefore, expect a one-time resource-intensive operation + for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> + for the first time on a given table. + </p> + <p class="p"> + For a table with a huge number of partitions and many columns, the approximately 400 bytes + of metadata per column per partition can add up to significant memory overhead, as it must + be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host + that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB, + you might experience service downtime. + </p> + </div> + + <p class="p"> + <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> only applies to partitioned tables. If you use the + <code class="ph codeph">INCREMENTAL</code> clause for an unpartitioned table, Impala automatically uses the original + <code class="ph codeph">COMPUTE STATS</code> statement. Such tables display <code class="ph codeph">false</code> under the + <code class="ph codeph">Incremental stats</code> column of the <code class="ph codeph">SHOW TABLE STATS</code> output. + </p> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <div class="p"> + Because many of the most performance-critical and resource-intensive + operations rely on table and column statistics to construct accurate and + efficient plans, <code class="ph codeph">COMPUTE STATS</code> is an important step at + the end of your ETL process. Run <code class="ph codeph">COMPUTE STATS</code> on all + tables as your first step during performance tuning for slow queries, or + troubleshooting for out-of-memory conditions: + <ul class="ul"> + <li class="li"> + Accurate statistics help Impala construct an efficient query plan + for join queries, improving performance and reducing memory usage. + </li> + <li class="li"> + Accurate statistics help Impala distribute the work effectively + for insert operations into Parquet tables, improving performance and + reducing memory usage. + </li> + <li class="li"> + Accurate statistics help Impala estimate the memory + required for each query, which is important when you use resource + management features, such as admission control and the YARN resource + management framework. The statistics help Impala to achieve high + concurrency, full utilization of available memory, and avoid + contention with workloads from other Hadoop components. + </li> + <li class="li"> + In <span class="keyword">Impala 2.8</span> and + higher, when you run the <code class="ph codeph">COMPUTE STATS</code> or + <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement against a + Parquet table, Impala automatically applies the query option setting + <code class="ph codeph">MT_DOP=4</code> to increase the amount of intra-node + parallelism during this CPU-intensive operation. See <a class="xref" href="impala_mt_dop.html">MT_DOP Query Option</a> for details about what this query option does + and how to use it with CPU-intensive <code class="ph codeph">SELECT</code> + statements. + </li> + </ul> + </div> + </div> + + <p class="p"> + <strong class="ph b">Computing stats for groups of partitions:</strong> + </p> + + <p class="p"> + In <span class="keyword">Impala 2.8</span> and higher, you can run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> + on multiple partitions, instead of the entire table or one partition at a time. You include + comparison operators other than <code class="ph codeph">=</code> in the <code class="ph codeph">PARTITION</code> clause, + and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement applies to all partitions that + match the comparison expression. + </p> + + <p class="p"> + For example, the <code class="ph codeph">INT_PARTITIONS</code> table contains 4 partitions. + The following <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements affect some but not all + partitions, as indicated by the <code class="ph codeph">Updated <var class="keyword varname">n</var> partition(s)</code> + messages. The partitions that are affected depend on values in the partition key column <code class="ph codeph">X</code> + that match the comparison expression in the <code class="ph codeph">PARTITION</code> clause. + </p> + +<pre class="pre codeblock"><code> +show partitions int_partitions; ++-------+-------+--------+------+--------------+-------------------+---------+... +| x | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |... ++-------+-------+--------+------+--------------+-------------------+---------+... +| 99 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | PARQUET |... +| 120 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |... +| 150 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |... +| 200 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |... +| Total | -1 | 0 | 0B | 0B | | |... ++-------+-------+--------+------+--------------+-------------------+---------+... + +compute incremental stats int_partitions partition (x < 100); ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 1 partition(s) and 1 column(s). | ++-----------------------------------------+ + +compute incremental stats int_partitions partition (x in (100, 150, 200)); ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 2 partition(s) and 1 column(s). | ++-----------------------------------------+ + +compute incremental stats int_partitions partition (x between 100 and 175); ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 2 partition(s) and 1 column(s). | ++-----------------------------------------+ + +compute incremental stats int_partitions partition (x in (100, 150, 200) or x < 100); ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 3 partition(s) and 1 column(s). | ++-----------------------------------------+ + +compute incremental stats int_partitions partition (x != 150); ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 3 partition(s) and 1 column(s). | ++-----------------------------------------+ + +</code></pre> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + Currently, the statistics created by the <code class="ph codeph">COMPUTE STATS</code> statement do not include + information about complex type columns. The column stats metrics for complex columns are always shown + as -1. For queries involving complex type columns, Impala uses + heuristics to estimate the data distribution within such columns. + </p> + + <p class="p"> + <strong class="ph b">HBase considerations:</strong> + </p> + + <p class="p"> + <code class="ph codeph">COMPUTE STATS</code> works for HBase tables also. The statistics gathered for HBase tables are + somewhat different than for HDFS-backed tables, but that metadata is still used for optimization when HBase + tables are involved in join queries. + </p> + + <p class="p"> + <strong class="ph b">Amazon S3 considerations:</strong> + </p> + + <p class="p"> + <code class="ph codeph">COMPUTE STATS</code> also works for tables where data resides in the Amazon Simple Storage Service (S3). + See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details. + </p> + + <p class="p"> + <strong class="ph b">Performance considerations:</strong> + </p> + + <p class="p"> + The statistics collected by <code class="ph codeph">COMPUTE STATS</code> are used to optimize join queries + <code class="ph codeph">INSERT</code> operations into Parquet tables, and other resource-intensive kinds of SQL statements. + See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details. + </p> + + <p class="p"> + For large tables, the <code class="ph codeph">COMPUTE STATS</code> statement itself might take a long time and you + might need to tune its performance. The <code class="ph codeph">COMPUTE STATS</code> statement does not work with the + <code class="ph codeph">EXPLAIN</code> statement, or the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>. + You can use the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> to examine timing information + for the statement as a whole. If a basic <code class="ph codeph">COMPUTE STATS</code> statement takes a long time for a + partitioned table, consider switching to the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax so that only + newly added partitions are analyzed each time. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + This example shows two tables, <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code>, with a small number distinct + values linked by a parent-child relationship between <code class="ph codeph">T1.ID</code> and <code class="ph codeph">T2.PARENT</code>. + <code class="ph codeph">T1</code> is tiny, while <code class="ph codeph">T2</code> has approximately 100K rows. Initially, the statistics + includes physical measurements such as the number of files, the total size, and size measurements for + fixed-length columns such as with the <code class="ph codeph">INT</code> type. Unknown values are represented by -1. After + running <code class="ph codeph">COMPUTE STATS</code> for each table, much more information is available through the + <code class="ph codeph">SHOW STATS</code> statements. If you were running a join query involving both of these tables, you + would need statistics for both tables to get the most effective optimization for the query. + </p> + + + +<pre class="pre codeblock"><code>[localhost:21000] > show table stats t1; +Query: show table stats t1 ++-------+--------+------+--------+ +| #Rows | #Files | Size | Format | ++-------+--------+------+--------+ +| -1 | 1 | 33B | TEXT | ++-------+--------+------+--------+ +Returned 1 row(s) in 0.02s +[localhost:21000] > show table stats t2; +Query: show table stats t2 ++-------+--------+----------+--------+ +| #Rows | #Files | Size | Format | ++-------+--------+----------+--------+ +| -1 | 28 | 960.00KB | TEXT | ++-------+--------+----------+--------+ +Returned 1 row(s) in 0.01s +[localhost:21000] > show column stats t1; +Query: show column stats t1 ++--------+--------+------------------+--------+----------+----------+ +| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | ++--------+--------+------------------+--------+----------+----------+ +| id | INT | -1 | -1 | 4 | 4 | +| s | STRING | -1 | -1 | -1 | -1 | ++--------+--------+------------------+--------+----------+----------+ +Returned 2 row(s) in 1.71s +[localhost:21000] > show column stats t2; +Query: show column stats t2 ++--------+--------+------------------+--------+----------+----------+ +| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | ++--------+--------+------------------+--------+----------+----------+ +| parent | INT | -1 | -1 | 4 | 4 | +| s | STRING | -1 | -1 | -1 | -1 | ++--------+--------+------------------+--------+----------+----------+ +Returned 2 row(s) in 0.01s +[localhost:21000] > compute stats t1; +Query: compute stats t1 ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 1 partition(s) and 2 column(s). | ++-----------------------------------------+ +Returned 1 row(s) in 5.30s +[localhost:21000] > show table stats t1; +Query: show table stats t1 ++-------+--------+------+--------+ +| #Rows | #Files | Size | Format | ++-------+--------+------+--------+ +| 3 | 1 | 33B | TEXT | ++-------+--------+------+--------+ +Returned 1 row(s) in 0.01s +[localhost:21000] > show column stats t1; +Query: show column stats t1 ++--------+--------+------------------+--------+----------+----------+ +| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | ++--------+--------+------------------+--------+----------+----------+ +| id | INT | 3 | -1 | 4 | 4 | +| s | STRING | 3 | -1 | -1 | -1 | ++--------+--------+------------------+--------+----------+----------+ +Returned 2 row(s) in 0.02s +[localhost:21000] > compute stats t2; +Query: compute stats t2 ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 1 partition(s) and 2 column(s). | ++-----------------------------------------+ +Returned 1 row(s) in 5.70s +[localhost:21000] > show table stats t2; +Query: show table stats t2 ++-------+--------+----------+--------+ +| #Rows | #Files | Size | Format | ++-------+--------+----------+--------+ +| 98304 | 1 | 960.00KB | TEXT | ++-------+--------+----------+--------+ +Returned 1 row(s) in 0.03s +[localhost:21000] > show column stats t2; +Query: show column stats t2 ++--------+--------+------------------+--------+----------+----------+ +| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | ++--------+--------+------------------+--------+----------+----------+ +| parent | INT | 3 | -1 | 4 | 4 | +| s | STRING | 6 | -1 | 14 | 9.3 | ++--------+--------+------------------+--------+----------+----------+ +Returned 2 row(s) in 0.01s</code></pre> + + <p class="p"> + The following example shows how to use the <code class="ph codeph">INCREMENTAL</code> clause, available in Impala 2.1.0 and + higher. The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax lets you collect statistics for newly added or + changed partitions, without rescanning the entire table. + </p> + +<pre class="pre codeblock"><code>-- Initially the table has no incremental stats, as indicated +-- 'false' under Incremental stats. +show table stats item_partitioned; ++-------------+-------+--------+----------+--------------+---------+------------------ +| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats ++-------------+-------+--------+----------+--------------+---------+------------------ +| Books | -1 | 1 | 223.74KB | NOT CACHED | PARQUET | false +| Children | -1 | 1 | 230.05KB | NOT CACHED | PARQUET | false +| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false +| Home | -1 | 1 | 232.56KB | NOT CACHED | PARQUET | false +| Jewelry | -1 | 1 | 223.72KB | NOT CACHED | PARQUET | false +| Men | -1 | 1 | 231.25KB | NOT CACHED | PARQUET | false +| Music | -1 | 1 | 237.90KB | NOT CACHED | PARQUET | false +| Shoes | -1 | 1 | 234.90KB | NOT CACHED | PARQUET | false +| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false +| Women | -1 | 1 | 226.27KB | NOT CACHED | PARQUET | false +| Total | -1 | 10 | 2.25MB | 0B | | ++-------------+-------+--------+----------+--------------+---------+------------------ + +-- After the first COMPUTE INCREMENTAL STATS, +-- all partitions have stats. The first +-- COMPUTE INCREMENTAL STATS scans the whole +-- table, discarding any previous stats from +-- a traditional COMPUTE STATS statement. +compute incremental stats item_partitioned; ++-------------------------------------------+ +| summary | ++-------------------------------------------+ +| Updated 10 partition(s) and 21 column(s). | ++-------------------------------------------+ +show table stats item_partitioned; ++-------------+-------+--------+----------+--------------+---------+------------------ +| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats ++-------------+-------+--------+----------+--------------+---------+------------------ +| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true +| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true +| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true +| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true +| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true +| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true +| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true +| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true +| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true +| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true +| Total | 17957 | 10 | 2.25MB | 0B | | ++-------------+-------+--------+----------+--------------+---------+------------------ + +-- Add a new partition... +alter table item_partitioned add partition (i_category='Camping'); +-- Add or replace files in HDFS outside of Impala, +-- rendering the stats for a partition obsolete. +!import_data_into_sports_partition.sh +refresh item_partitioned; +drop incremental stats item_partitioned partition (i_category='Sports'); +-- Now some partitions have incremental stats +-- and some do not. +show table stats item_partitioned; ++-------------+-------+--------+----------+--------------+---------+------------------ +| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats ++-------------+-------+--------+----------+--------------+---------+------------------ +| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true +| Camping | -1 | 1 | 408.02KB | NOT CACHED | PARQUET | false +| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true +| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true +| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true +| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true +| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true +| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true +| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true +| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false +| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true +| Total | 17957 | 11 | 2.65MB | 0B | | ++-------------+-------+--------+----------+--------------+---------+------------------ + +-- After another COMPUTE INCREMENTAL STATS, +-- all partitions have incremental stats, and only the 2 +-- partitions without incremental stats were scanned. +compute incremental stats item_partitioned; ++------------------------------------------+ +| summary | ++------------------------------------------+ +| Updated 2 partition(s) and 21 column(s). | ++------------------------------------------+ +show table stats item_partitioned; ++-------------+-------+--------+----------+--------------+---------+------------------ +| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats ++-------------+-------+--------+----------+--------------+---------+------------------ +| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true +| Camping | 5328 | 1 | 408.02KB | NOT CACHED | PARQUET | true +| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true +| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true +| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true +| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true +| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true +| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true +| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true +| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true +| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true +| Total | 17957 | 11 | 2.65MB | 0B | | ++-------------+-------+--------+----------+--------------+---------+------------------ +</code></pre> + + <p class="p"> + <strong class="ph b">File format considerations:</strong> + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with tables created with any of the file formats supported + by Impala. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about working with the + different file formats. The following considerations apply to <code class="ph codeph">COMPUTE STATS</code> depending on the + file format of the table. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with text tables with no restrictions. These tables can be + created through either Impala or Hive. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with Parquet tables. These tables can be created through + either Impala or Hive. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with Avro tables without restriction in <span class="keyword">Impala 2.2</span> + and higher. In earlier releases, <code class="ph codeph">COMPUTE STATS</code> worked only for Avro tables created through Hive, + and required the <code class="ph codeph">CREATE TABLE</code> statement to use SQL-style column names and types rather than an + Avro-style schema specification. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with RCFile tables with no restrictions. These tables can + be created through either Impala or Hive. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with SequenceFile tables with no restrictions. These + tables can be created through either Impala or Hive. + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement works with partitioned tables, whether all the partitions use + the same file format, or some partitions are defined through <code class="ph codeph">ALTER TABLE</code> to use different + file formats. + </p> + + <p class="p"> + <strong class="ph b">Statement type:</strong> DDL + </p> + + <p class="p"> + <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and + <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code> + or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the + <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the + <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of + in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI + (port 25000). + </p> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> Prior to Impala 1.4.0, + <code class="ph codeph">COMPUTE STATS</code> counted the number of + <code class="ph codeph">NULL</code> values in each column and recorded that figure + in the metastore database. Because Impala does not currently use the + <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and + higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by + skipping this <code class="ph codeph">NULL</code> counting. </div> + + <p class="p"> + <strong class="ph b">Internal details:</strong> + </p> + <p class="p"> + Behind the scenes, the <code class="ph codeph">COMPUTE STATS</code> statement + executes two statements: one to count the rows of each partition + in the table (or the entire table if unpartitioned) through the + <code class="ph codeph">COUNT(*)</code> function, + and another to count the approximate number of distinct values + in each column through the <code class="ph codeph">NDV()</code> function. + You might see these queries in your monitoring and diagnostic displays. + The same factors that affect the performance, scalability, and + execution of other queries (such as parallel execution, memory usage, + admission control, and timeouts) also apply to the queries run by the + <code class="ph codeph">COMPUTE STATS</code> statement. + </p> + + <p class="p"> + <strong class="ph b">HDFS permissions:</strong> + </p> + <p class="p"> + The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, + typically the <code class="ph codeph">impala</code> user, must have read + permission for all affected files in the source directory: + all files in the case of an unpartitioned table or + a partitioned table in the case of <code class="ph codeph">COMPUTE STATS</code>; + or all the files in partitions without incremental stats in + the case of <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>. + It must also have read and execute permissions for all + relevant directories holding the data files. + (Essentially, <code class="ph codeph">COMPUTE STATS</code> requires the + same permissions as the underlying <code class="ph codeph">SELECT</code> queries it runs + against the table.) + </p> + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement applies to Kudu tables. + Impala does not compute the number of rows for each partition for + Kudu tables. Therefore, you do not need to re-run the operation when + you see -1 in the <code class="ph codeph"># Rows</code> column of the output from + <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for + all Kudu tables. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>, + <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html> http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html ---------------------------------------------------------------------- diff --git a/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html new file mode 100644 index 0000000..03d21e2 --- /dev/null +++ b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html @@ -0,0 +1,23 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats_sample_min_sample_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</title></head><body id="compute_stats_sample_min_sample_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + <h1 class="title topictitle1" id="ariaid-title1">COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</h1> + + + <div class="body conbody"> + <p class="p">The <code class="ph codeph">COMPUTE_STATS_MIN_SAMPLE_SIZE</code> query option specifies + the minimum number of bytes that will be scanned in <code class="ph codeph">COMPUTE STATS + TABLESAMPLE</code>, regardless of the user-supplied sampling percent. + This query option prevents sampling for very small tables where accurate + stats can be obtained cheaply without sampling because the minimum sample + size is required to get meaningful stats.</p> + <p class="p"> + <strong class="ph b">Type:</strong> integer + </p> + <p class="p"><strong class="ph b">Default:</strong> 1GB</p> + <p class="p"><strong class="ph b">Added in</strong>: <span class="keyword">Impala 2.12</span></p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_concepts.html ---------------------------------------------------------------------- diff --git a/docs/build3x/html/topics/impala_concepts.html b/docs/build3x/html/topics/impala_concepts.html new file mode 100644 index 0000000..b98e4ce --- /dev/null +++ b/docs/build3x/html/topics/impala_concepts.html @@ -0,0 +1,48 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_components.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_development.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hadoop.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="concepts"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Concepts and Architecture</title></head><body id="concepts"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Concepts and Architecture</h1> + + + + <div class="body conbody"> + + <p class="p"> + The following sections provide background information to help you become productive using Impala and + its features. Where appropriate, the explanations include context to help understand how aspects of Impala + relate to other technologies you might already be familiar with, such as relational database management + systems and data warehouses, or other Hadoop components such as Hive, HDFS, and HBase. + </p> + + <p class="p toc"></p> + </div> + + + + + + + + + + + + + + + + + + + + + + + + + + + + +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_components.html">Components of the Impala Server</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_development.html">Developing Impala Applications</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hadoop.html">How Impala Fits Into the Hadoop Ecosystem</a></strong><br></li></ul></nav></article></main></body></html>