Repository: incubator-impala Updated Branches: refs/heads/asf-site 3d1c7a510 -> ae2f8d035
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_parquet.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_parquet.html b/docs/build/html/topics/impala_parquet.html index 894c97a..151df86 100644 --- a/docs/build/html/topics/impala_parquet.html +++ b/docs/build/html/topics/impala_parquet.html @@ -170,24 +170,21 @@ the <code class="ph codeph">INSERT</code> statement to fine-tune the overall performance of the operation and its resource usage: <ul class="ul"> - <li class="li"> - These hints are available in Impala 1.2.2 and higher. - </li> <li class="li"> - You would only use these hints if an <code class="ph codeph">INSERT</code> into a partitioned Parquet table was + You would only use hints if an <code class="ph codeph">INSERT</code> into a partitioned Parquet table was failing due to capacity limits, or if such an <code class="ph codeph">INSERT</code> was succeeding but with less-than-optimal performance. </li> <li class="li"> - To use these hints, put the hint keyword <code class="ph codeph">[SHUFFLE]</code> or <code class="ph codeph">[NOSHUFFLE]</code> + To use a hint to influence the join order, put the hint keyword <code class="ph codeph">/* +SHUFFLE */</code> or <code class="ph codeph">/* +NOSHUFFLE */</code> (including the square brackets) after the <code class="ph codeph">PARTITION</code> clause, immediately before the <code class="ph codeph">SELECT</code> keyword. </li> <li class="li"> - <code class="ph codeph">[SHUFFLE]</code> selects an execution plan that minimizes the number of files being written + <code class="ph codeph">/* +SHUFFLE */</code> selects an execution plan that reduces the number of files being written simultaneously to HDFS, and the number of memory buffers holding data for individual partitions. Thus it reduces overall resource usage for the <code class="ph codeph">INSERT</code> operation, allowing some <code class="ph codeph">INSERT</code> operations to succeed that otherwise would fail. It does involve some data @@ -196,27 +193,39 @@ </li> <li class="li"> - <code class="ph codeph">[NOSHUFFLE]</code> selects an execution plan that might be faster overall, but might also + <code class="ph codeph">/* +NOSHUFFLE */</code> selects an execution plan that might be faster overall, but might also produce a larger number of small data files or exceed capacity limits, causing the - <code class="ph codeph">INSERT</code> operation to fail. Use <code class="ph codeph">[SHUFFLE]</code> in cases where an + <code class="ph codeph">INSERT</code> operation to fail. Use <code class="ph codeph">/* +SHUFFLE */</code> in cases where an <code class="ph codeph">INSERT</code> statement fails or runs inefficiently due to all nodes attempting to construct data for all partitions. </li> <li class="li"> - Impala automatically uses the <code class="ph codeph">[SHUFFLE]</code> method if any partition key column in the + Impala automatically uses the <code class="ph codeph">/* +SHUFFLE */</code> method if any partition key column in the source table, mentioned in the <code class="ph codeph">INSERT ... SELECT</code> query, does not have column - statistics. In this case, only the <code class="ph codeph">[NOSHUFFLE]</code> hint would have any effect. + statistics. In this case, only the <code class="ph codeph">/* +NOSHUFFLE */</code> hint would have any effect. </li> <li class="li"> If column statistics are available for all partition key columns in the source table mentioned in the - <code class="ph codeph">INSERT ... SELECT</code> query, Impala chooses whether to use the <code class="ph codeph">[SHUFFLE]</code> - or <code class="ph codeph">[NOSHUFFLE]</code> technique based on the estimated number of distinct values in those + <code class="ph codeph">INSERT ... SELECT</code> query, Impala chooses whether to use the <code class="ph codeph">/* +SHUFFLE */</code> + or <code class="ph codeph">/* +NOSHUFFLE */</code> technique based on the estimated number of distinct values in those columns and the number of nodes involved in the <code class="ph codeph">INSERT</code> operation. In this case, you - might need the <code class="ph codeph">[SHUFFLE]</code> or the <code class="ph codeph">[NOSHUFFLE]</code> hint to override the + might need the <code class="ph codeph">/* +SHUFFLE */</code> or the <code class="ph codeph">/* +NOSHUFFLE */</code> hint to override the execution plan selected by Impala. </li> + + <li class="li"> + In <span class="keyword">Impala 2.8</span> or higher, you can make the + <code class="ph codeph">INSERT</code> operation organize (<span class="q">"cluster"</span>) + the data for each partition to avoid buffering data for multiple partitions + and reduce the risk of an out-of-memory condition. Specify the hint as + <code class="ph codeph">/* +CLUSTERED */</code>. This technique is primarily + useful for inserts into Parquet tables, where the large block + size requires substantial memory to buffer data for multiple + output files at once. + </li> + </ul> </div> @@ -405,6 +414,25 @@ to 268435456 (256 MB) to match the row group size produced by Impala. </p> + <p class="p"> + In <span class="keyword">Impala 2.9</span> and higher, Parquet files written by Impala include + embedded metadata specifying the minimum and maximum values for each column, within + each row group and each data page within the row group. Impala-written Parquet files + typically contain a single row group; a row group can contain many data pages. + Impala uses this information (currently, only the metadata for each row group) + when reading each Parquet data file during a query, to quickly determine whether each + row group within the file potentially includes any rows that match the conditions in the + <code class="ph codeph">WHERE</code> clause. For example, if the column <code class="ph codeph">X</code> within + a particular Parquet file has a minimum value of 1 and a maximum value of 100, then + a query including the clause <code class="ph codeph">WHERE x > 200</code> can quickly determine + that it is safe to skip that particular file, instead of scanning all the associated + column values. This optimization technique is especially effective for tables that + use the <code class="ph codeph">SORT BY</code> clause for the columns most frequently checked in + <code class="ph codeph">WHERE</code> clauses, because any <code class="ph codeph">INSERT</code> operation on + such tables produces Parquet data files with relatively narrow ranges of column values + within each file. + </p> + </div> <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="parquet_performance__parquet_partitioning"> @@ -1372,6 +1400,7 @@ INT96 -> TIMESTAMP </p> <pre class="pre codeblock"><code>BINARY + OriginalType UTF8 -> STRING +BINARY + OriginalType ENUM -> STRING BINARY + OriginalType DECIMAL -> DECIMAL </code></pre> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_parquet_file_size.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_parquet_file_size.html b/docs/build/html/topics/impala_parquet_file_size.html index 695c557..a4d0429 100644 --- a/docs/build/html/topics/impala_parquet_file_size.html +++ b/docs/build/html/topics/impala_parquet_file_size.html @@ -65,6 +65,14 @@ INSERT OVERWRITE parquet_table SELECT * FROM text_table; </p> <p class="p"> + Because ADLS does not expose the block sizes of data files the way HDFS does, + any Impala <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements + use the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of + Parquet data files. (Using a large block size is more important for Parquet tables than + for tables that use other file formats.) + </p> + + <p class="p"> <strong class="ph b">Isilon considerations:</strong> </p> <div class="p"> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_query_options.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_query_options.html b/docs/build/html/topics/impala_query_options.html index ee27d90..2fbda3f 100644 --- a/docs/build/html/topics/impala_query_options.html +++ b/docs/build/html/topics/impala_query_options.html @@ -1,6 +1,6 @@ <!DOCTYPE html SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_default_limit_exceeded.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_error.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_allow_unsupported_formats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_count_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_batch_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_debug_action.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_order_by_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_codegen.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_row_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_streaming_preaggregations.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_unsafe_spills.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_single_node_rows_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_level.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_cache_blocks.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_progress.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_summary.html"><meta name="DC.Relation" scheme="U RI" content="../topics/impala_max_errors.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_io_buffers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_scan_range_length.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mem_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mt_dop.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_nodes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_scanner_threads.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_optimize_partition_key_scans.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_annotate_strings_utf8.html"><meta name="DC.Relation" scheme="URI" content="../topics/ impala_parquet_fallback_schema_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_file_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prefetch_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_timeout_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_request_pool.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_replica_preference.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_reservation_request_timeout.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_bloom_filter_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_max_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_min_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_mode.html"><meta name="DC.Relation" scheme="URI" content=" ../topics/impala_runtime_filter_wait_time_ms.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_s3_skip_insert_staging.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scan_node_codegen_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scratch_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schedule_random_replica.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_support_start_over.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_sync_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_v_cpu_cores.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Q uery Options for the SET Statement</title></head><body id="query_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_default_limit_exceeded.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_error.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_allow_unsupported_formats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_count_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_batch_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_debug_action.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_join_distribution_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_order_by_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_codegen.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_decimal_v2.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_row_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_streaming_preaggregations.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_unsafe_spills.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_single_node_rows_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_level.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_cache_blocks.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_caching.html"><meta name="DC.Rela tion" scheme="URI" content="../topics/impala_live_progress.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_summary.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_errors.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_io_buffers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_scan_range_length.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mem_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mt_dop.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_nodes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_scanner_threads.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_optimize_partition_key_scans.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet _compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_annotate_strings_utf8.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_fallback_schema_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_file_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prefetch_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_timeout_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_request_pool.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_replica_preference.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_reservation_request_timeout.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_bloom_filter_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_max_size.html"><meta name="DC.Relation" scheme="URI" content=".. /topics/impala_runtime_filter_min_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_wait_time_ms.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_s3_skip_insert_staging.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scan_node_codegen_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scratch_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schedule_random_replica.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_support_start_over.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_sync_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_v_cpu_cores.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" co ntent="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Query Options for the SET Statement</title></head><body id="query_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> <h1 class="title topictitle1" id="ariaid-title1">Query Options for the SET Statement</h1> @@ -46,4 +46,4 @@ <a class="xref" href="impala_set.html#set">SET Statement</a> </p> </div> -<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_default_limit_exceeded.html">ABORT_ON_DEFAULT_LIMIT_EXCEEDED Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_error.html">ABORT_ON_ERROR Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_batch_size.html">BATCH_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compression_codec.html">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="lin k ulchildlink"><strong><a href="../topics/impala_debug_action.html">DEBUG_ACTION Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_order_by_limit.html">DEFAULT_ORDER_BY_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_codegen.html">DISABLE_CODEGEN Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><str ong><a href="../topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_level.html">EXPLAIN_LEVEL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_caching.html">HBASE_CACHING Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_progress.html">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_summary.html">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_errors.html">MAX_ERRORS Query Option</a></strong> <br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_io_buffers.html">MAX_IO_BUFFERS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_scan_range_length.html">MAX_SCAN_RANGE_LENGTH Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mem_limit.html">MEM_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mt_dop.html">MT_DOP Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_nodes.html">NUM_NODES Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS Query Option</a></strong><br></li><li class="link ulchild link"><strong><a href="../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_prefetch_mode.html">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a></st rong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_request_pool.html">REQUEST_POOL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_reservation_request_timeout.html">RESERVATION_REQUEST_TIMEOUT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li>< li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scan_node_codegen_threshold.html">SCAN_NODE_CODEGEN_THRESHOLD Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scratch_limit.html">SCRATCH_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_support_start_over.html">SUPPORT_START_OVER Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_sync_ddl.html">SYNC_DDL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_v_cpu_cores.html">V_CPU_CORES Query Option</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_default_limit_exceeded.html">ABORT_ON_DEFAULT_LIMIT_EXCEEDED Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_error.html">ABORT_ON_ERROR Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_batch_size.html">BATCH_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compression_codec.html">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="lin k ulchildlink"><strong><a href="../topics/impala_debug_action.html">DEBUG_ACTION Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_join_distribution_mode.html">DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_order_by_limit.html">DEFAULT_ORDER_BY_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_codegen.html">DISABLE_CODEGEN Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_decimal_v2.html">DECIMAL_V2 Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAM ING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_level.html">EXPLAIN_LEVEL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_caching.html">HBASE_CACHING Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_progress.html">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>< /strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_summary.html">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_errors.html">MAX_ERRORS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_io_buffers.html">MAX_IO_BUFFERS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_scan_range_length.html">MAX_SCAN_RANGE_LENGTH Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mem_limit.html">MEM_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mt_dop.html">MT_DOP Query Option</a></strong><br></li ><li class="link ulchildlink"><strong><a >href="../topics/impala_num_nodes.html">NUM_NODES Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS > Query Option (Impala 2.5 or higher only)</a></strong><br></li><li >class="link ulchildlink"><strong><a >href="../topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC > Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8 > Query Option (Impala 2.6 or higher only)</a></strong><br></li><li >class="link ulchildlink"><strong><a >href="../topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION > Query Option (Impala 2.6 or higher only)</a></strong><br ></li><li class="link ulchildlink"><strong><a >href="../topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_prefetch_mode.html">PREFETCH_MODE Query Option (Impala >2.6 or higher only)</a></strong><br></li><li class="link >ulchildlink"><strong><a >href="../topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S Query Option >(Impala 2.0 or higher only)</a></strong><br></li><li class="link >ulchildlink"><strong><a >href="../topics/impala_request_pool.html">REQUEST_POOL Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_replica_preference.html">REPLICA_PREFERENCE Query >Option (Impala 2.7 or higher only)</a></strong><br></li><li class="link >ulchildlink"><strong><a >href="../topics/impala_reservation_request_timeout.html">RESERVATION_REQUEST_TIMEOUT > Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_runtime_bl oom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a></strong><br ></li><li class="link ulchildlink"><strong><a >href="../topics/impala_scan_node_codegen_threshold.html">SCAN_NODE_CODEGEN_THRESHOLD > Query Option (Impala 2.5 or higher only)</a></strong><br></li><li >class="link ulchildlink"><strong><a >href="../topics/impala_scratch_limit.html">SCRATCH_LIMIT Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA >Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link >ulchildlink"><strong><a >href="../topics/impala_support_start_over.html">SUPPORT_START_OVER Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_sync_ddl.html">SYNC_DDL Query >Option</a></strong><br></li><li class="link ulchildlink"><strong><a >href="../topics/impala_v_cpu_cores.html">V_CPU_CORES Query >Option</a></strong><br></li></ul><div class="familylinks"><div >class="parentlink"><strong>Parent topic:</strong> <a class="link" href ="../topics/impala_set.html">SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_refresh.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_refresh.html b/docs/build/html/topics/impala_refresh.html index 75ce520..03e437d 100644 --- a/docs/build/html/topics/impala_refresh.html +++ b/docs/build/html/topics/impala_refresh.html @@ -21,7 +21,9 @@ <strong class="ph b">Syntax:</strong> </p> -<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])]</code></pre> +<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])] +<span class="ph">REFRESH FUNCTIONS <var class="keyword varname">db_name</var></span> +</code></pre> <p class="p"> <strong class="ph b">Usage notes:</strong> @@ -377,6 +379,25 @@ ERROR: AnalysisException: Items in partition spec must exactly match the partiti </p> <p class="p"> + <strong class="ph b">UDF considerations:</strong> + </p> + <div class="p"> + In <span class="keyword">Impala 2.9</span> and higher, you can refresh the user-defined functions (UDFs) + that Impala recognizes, at the database level, by running the <code class="ph codeph">REFRESH FUNCTIONS</code> + statement with the database name as an argument. Java-based UDFs can be added to the metastore + database through Hive <code class="ph codeph">CREATE FUNCTION</code> statements, and made visible to Impala + by subsequently running <code class="ph codeph">REFRESH FUNCTIONS</code>. For example: + +<pre class="pre codeblock"><code>CREATE DATABASE shared_udfs; +USE shared_udfs; +...use CREATE FUNCTION statements in Hive to create some Java-based UDFs + that Impala is not initially aware of... +REFRESH FUNCTIONS shared_udfs; +SELECT udf_created_by_hive(c1) FROM ... +</code></pre> + </div> + + <p class="p"> <strong class="ph b">Related information:</strong> </p> <p class="p"> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_runtime_filtering.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_runtime_filtering.html b/docs/build/html/topics/impala_runtime_filtering.html index 0b3bd16..d8f20ae 100644 --- a/docs/build/html/topics/impala_runtime_filtering.html +++ b/docs/build/html/topics/impala_runtime_filtering.html @@ -269,12 +269,12 @@ </li> <li class="li"> <p class="p"> - <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a> + <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a> </p> </li> <li class="li"> <p class="p"> - <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a> + <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a> </p> </li> <li class="li"> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_scalability.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_scalability.html b/docs/build/html/topics/impala_scalability.html index a850d35..41e1d10 100644 --- a/docs/build/html/topics/impala_scalability.html +++ b/docs/build/html/topics/impala_scalability.html @@ -1,6 +1,6 @@ <!DOCTYPE html SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2. 8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"> <meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> <h1 class="title topictitle1" id="ariaid-title1">Scalability Considerations for Impala</h1> @@ -207,13 +207,125 @@ </div> </article> + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="scalability__scalability_coordinator"> + + <h2 class="title topictitle2" id="ariaid-title4">Controlling which Hosts are Coordinators and Executors</h2> + + <div class="body conbody"> + + <p class="p"> + By default, each host in the cluster that runs the <span class="keyword cmdname">impalad</span> + daemon can act as the coordinator for an Impala query, execute the fragments + of the execution plan for the query, or both. During highly concurrent + workloads for large-scale queries, especially on large clusters, the dual + roles can cause scalability issues: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + The extra work required for a host to act as the coordinator could interfere + with its capacity to perform other work for the earlier phases of the query. + For example, the coordinator can experience significant network and CPU overhead + during queries containing a large number of query fragments. Each coordinator + caches metadata for all table partitions and data files, which can be substantial + and contend with memory needed to process joins, aggregations, and other operations + performed by query executors. + </p> + </li> + <li class="li"> + <p class="p"> + Having a large number of hosts act as coordinators can cause unnecessary network + overhead, or even timeout errors, as each of those hosts communicates with the + <span class="keyword cmdname">statestored</span> daemon for metadata updates. + </p> + </li> + <li class="li"> + <p class="p"> + The <span class="q">"soft limits"</span> imposed by the admission control feature are more likely + to be exceeded when there are a large number of heavily loaded hosts acting as + coordinators. + </p> + </li> + </ul> + + <p class="p"> + If such scalability bottlenecks occur, you can explicitly specify that certain + hosts act as query coordinators, but not executors for query fragments. + These hosts do not participate in I/O-intensive operations such as scans, + and CPU-intensive operations such as aggregations. + </p> + + <p class="p"> + Then, you specify that the + other hosts act as executors but not coordinators. These hosts do not communicate + with the <span class="keyword cmdname">statestored</span> daemon or process the final result sets + from queries. You cannot connect to these hosts through clients such as + <span class="keyword cmdname">impala-shell</span> or business intelligence tools. + </p> + + <p class="p"> + This feature is available in <span class="keyword">Impala 2.9</span> and higher. + </p> + + <p class="p"> + To use this feature, you specify one of the following startup flags for the + <span class="keyword cmdname">impalad</span> daemon on each host: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + <code class="ph codeph">is_executor=false</code> for each host that + does not act as an executor for Impala queries. + These hosts act exclusively as query coordinators. + This setting typically applies to a relatively small number of + hosts, because the most common topology is to have nearly all + DataNodes doing work for query execution. + </p> + </li> + <li class="li"> + <p class="p"> + <code class="ph codeph">is_coordinator=false</code> for each host that + does not act as a coordinator for Impala queries. + These hosts act exclusively as executors. + The number of hosts with this setting typically increases + as the cluster grows larger and handles more table partitions, + data files, and concurrent queries. As the overhead for query + coordination increases, it becomes more important to centralize + that work on dedicated hosts. + </p> + </li> + </ul> + + <p class="p"> + By default, both of these settings are enabled for each <code class="ph codeph">impalad</code> + instance, allowing all such hosts to act as both executors and coordinators. + </p> + + <p class="p"> + For example, on a 100-node cluster, you might specify <code class="ph codeph">is_executor=false</code> + for 10 hosts, to dedicate those hosts as query coordinators. Then specify + <code class="ph codeph">is_coordinator=false</code> for the remaining 90 hosts. All explicit or + load-balanced connections must go to the 10 hosts acting as coordinators. These hosts + perform the network communication to keep metadata up-to-date and route query results + to the appropriate clients. The remaining 90 hosts perform the intensive I/O, CPU, and + memory operations that make up the bulk of the work for each query. If a bottleneck or + other performance issue arises on a specific host, you can narrow down the cause more + easily because each host is dedicated to specific operations within the overall + Impala workload. + </p> + + </div> + </article> + - <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="scalability__spill_to_disk"> + <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__spill_to_disk"> - <h2 class="title topictitle2" id="ariaid-title4">SQL Operations that Spill to Disk</h2> + <h2 class="title topictitle2" id="ariaid-title5">SQL Operations that Spill to Disk</h2> <div class="body conbody"> @@ -575,8 +687,8 @@ these tables, hint the plan or disable this behavior via query options to enable </div> </article> -<article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__complex_query"> -<h2 class="title topictitle2" id="ariaid-title5">Limits on Query Size and Complexity</h2> +<article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__complex_query"> +<h2 class="title topictitle2" id="ariaid-title6">Limits on Query Size and Complexity</h2> <div class="body conbody"> <p class="p"> There are hardcoded limits on the maximum size and complexity of queries. @@ -600,8 +712,8 @@ use a single <code class="ph codeph">IN</code> clause: </div> </article> -<article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__scalability_io"> -<h2 class="title topictitle2" id="ariaid-title6">Scalability Considerations for Impala I/O</h2> +<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__scalability_io"> +<h2 class="title topictitle2" id="ariaid-title7">Scalability Considerations for Impala I/O</h2> <div class="body conbody"> <p class="p"> Impala parallelizes its I/O operations aggressively, @@ -626,8 +738,8 @@ Currently, there is no throttling mechanism for Impala I/O. </div> </article> -<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__big_tables"> -<h2 class="title topictitle2" id="ariaid-title7">Scalability Considerations for Table Layout</h2> +<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__big_tables"> +<h2 class="title topictitle2" id="ariaid-title8">Scalability Considerations for Table Layout</h2> <div class="body conbody"> <p class="p"> Due to the overhead of retrieving and updating table metadata @@ -644,8 +756,8 @@ try to limit the number of partitions for any partitioned table to a few tens of </div> </article> -<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__kerberos_overhead_cluster_size"> -<h2 class="title topictitle2" id="ariaid-title8">Kerberos-Related Network Overhead for Large Clusters</h2> +<article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__kerberos_overhead_cluster_size"> +<h2 class="title topictitle2" id="ariaid-title9">Kerberos-Related Network Overhead for Large Clusters</h2> <div class="body conbody"> <p class="p"> When Impala starts up, or after each <code class="ph codeph">kinit</code> refresh, Impala sends a number of @@ -670,8 +782,43 @@ so other secure services might be affected temporarily. </div> </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__scalability_hotspots"> - <h2 class="title topictitle2" id="ariaid-title9">Avoiding CPU Hotspots for HDFS Cached Data</h2> + <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="scalability__kerberos_overhead_memory_usage"> + <h2 class="title topictitle2" id="ariaid-title10">Kerberos-Related Memory Overhead for Large Clusters</h2> + <div class="body conbody"> + <div class="p"> + On a kerberized cluster with high memory utilization, <span class="keyword cmdname">kinit</span> commands executed after + every <code class="ph codeph">'kerberos_reinit_interval'</code> may cause out-of-memory errors, because executing + the command involves a fork of the Impala process. The error looks similar to the following: +<pre class="pre codeblock"><code> +Failed to obtain Kerberos ticket for principal: <varname>principal_details</varname> +Failed to execute shell cmd: 'kinit -k -t <varname>keytab_details</varname>', +error was: Error(12): Cannot allocate memory + +</code></pre> + </div> + <div class="p"> + The following command changes the <code class="ph codeph">vm.overcommit_memory</code> + setting immediately on a running host. However, this setting is reset + when the host is restarted. +<pre class="pre codeblock"><code> +echo 1 > /proc/sys/vm/overcommit_memory + +</code></pre> + </div><div class="p"> + To change the setting in a persistent way, add the following line to the + <span class="ph filepath">/etc/sysctl.conf</span> file: +<pre class="pre codeblock"><code> +vm.overcommit_memory=1 + +</code></pre> + </div><p class="p"> + Then run <code class="ph codeph">sysctl -p</code>. No reboot is needed. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="scalability__scalability_hotspots"> + <h2 class="title topictitle2" id="ariaid-title11">Avoiding CPU Hotspots for HDFS Cached Data</h2> <div class="body conbody"> <p class="p"> You can use the HDFS caching feature, described in <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>, http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_shell_options.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_shell_options.html b/docs/build/html/topics/impala_shell_options.html index be21f0b..dc287c1 100644 --- a/docs/build/html/topics/impala_shell_options.html +++ b/docs/build/html/topics/impala_shell_options.html @@ -166,6 +166,23 @@ <tr class="row"> <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 "> <p class="p"> + N/A + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 "> + <p class="p"> + history_max=1000 + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 "> + <p class="p"> + Sets the maximum number of queries to store in the history file. + </p> + </td> + </tr> + <tr class="row"> + <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 "> + <p class="p"> -i <var class="keyword varname">hostname</var> or --impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>] </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_show.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_show.html b/docs/build/html/topics/impala_show.html index cfe748b..bc84a3b 100644 --- a/docs/build/html/topics/impala_show.html +++ b/docs/build/html/topics/impala_show.html @@ -985,7 +985,7 @@ show table stats kudu_table; Kudu tables. Therefore, you do not need to re-run <code class="ph codeph">COMPUTE STATS</code> when you see -1 in the <code class="ph codeph"># Rows</code> column of the output from <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for - all Kudu tables. + all Kudu tables. </p> <p class="p"> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_string_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_string_functions.html b/docs/build/html/topics/impala_string_functions.html index aab1f35..b3e6dbd 100644 --- a/docs/build/html/topics/impala_string_functions.html +++ b/docs/build/html/topics/impala_string_functions.html @@ -298,7 +298,7 @@ SELECT chr(97); <dt class="dt dlterm" id="string_functions__instr"> - <code class="ph codeph">instr(string str, string substr)</code> + <code class="ph codeph">instr(string str, string substr <span class="ph">[, bigint position [, bigint occurrence ] ]</span>)</code> </dt> <dd class="dd"> @@ -308,6 +308,180 @@ SELECT chr(97); <p class="p"> <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + If the substring is not present in the string, the function returns 0: + </p> + +<pre class="pre codeblock"><code> +select instr('foo bar bletch', 'z'); ++------------------------------+ +| instr('foo bar bletch', 'z') | ++------------------------------+ +| 0 | ++------------------------------+ +</code></pre> + + <p class="p"> + The optional third and fourth arguments let you find instances of the substring + other than the first instance starting from the left: + </p> + <ul class="ul"> + <li class="li"> + <p class="p"> + The third argument lets you specify a starting point within the string + other than 1: + </p> + +<pre class="pre codeblock"><code> +-- Restricting the search to positions 7..end, +-- the first occurrence of 'b' is at position 9. +select instr('foo bar bletch', 'b', 7); ++---------------------------------+ +| instr('foo bar bletch', 'b', 7) | ++---------------------------------+ +| 9 | ++---------------------------------+ + +-- If there are no more occurrences after the +-- specified position, the result is 0. +select instr('foo bar bletch', 'b', 10); ++----------------------------------+ +| instr('foo bar bletch', 'b', 10) | ++----------------------------------+ +| 0 | ++----------------------------------+ +</code></pre> + + <p class="p"> + If the third argument is negative, the search works right-to-left + starting that many characters from the right. The return value still + represents the position starting from the left side of the string. + </p> + +<pre class="pre codeblock"><code> +-- Scanning right to left, the first occurrence of 'o' +-- is at position 8. (8th character from the left.) +select instr('hello world','o',-1); ++-------------------------------+ +| instr('hello world', 'o', -1) | ++-------------------------------+ +| 8 | ++-------------------------------+ + +-- Scanning right to left, starting from the 6th character +-- from the right, the first occurrence of 'o' is at +-- position 5 (5th character from the left). +select instr('hello world','o',-6); ++-------------------------------+ +| instr('hello world', 'o', -6) | ++-------------------------------+ +| 5 | ++-------------------------------+ + +-- If there are no more occurrences after the +-- specified position, the result is 0. +select instr('hello world','o',-10); ++--------------------------------+ +| instr('hello world', 'o', -10) | ++--------------------------------+ +| 0 | ++--------------------------------+ +</code></pre> + + </li> + + <li class="li"> + <p class="p"> + The fourth argument lets you specify an occurrence other than the first: + </p> + +<pre class="pre codeblock"><code> +-- 2nd occurrence of 'b' is at position 9. +select instr('foo bar bletch', 'b', 1, 2); ++------------------------------------+ +| instr('foo bar bletch', 'b', 1, 2) | ++------------------------------------+ +| 9 | ++------------------------------------+ + +-- Negative position argument means scan right-to-left. +-- This example finds second instance of 'b' from the right. +select instr('foo bar bletch', 'b', -1, 2); ++-------------------------------------+ +| instr('foo bar bletch', 'b', -1, 2) | ++-------------------------------------+ +| 5 | ++-------------------------------------+ +</code></pre> + + <p class="p"> + If the fourth argument is greater than the number of matching occurrences, + the function returns 0: + </p> + +<pre class="pre codeblock"><code> +-- There is no 3rd occurrence within the string. +select instr('foo bar bletch', 'b', 1, 3); ++------------------------------------+ +| instr('foo bar bletch', 'b', 1, 3) | ++------------------------------------+ +| 0 | ++------------------------------------+ + +-- There is not even 1 occurrence when scanning +-- the string starting at position 10. +select instr('foo bar bletch', 'b', 10, 1); ++-------------------------------------+ +| instr('foo bar bletch', 'b', 10, 1) | ++-------------------------------------+ +| 0 | ++-------------------------------------+ +</code></pre> + + <p class="p"> + The fourth argument cannot be negative or zero. A non-positive value for + this argument causes an error: + </p> + +<pre class="pre codeblock"><code> +select instr('foo bar bletch', 'b', 1, 0); +ERROR: UDF ERROR: Invalid occurrence parameter to instr function: 0 + +select instr('aaaaaaaaa','aa', 1, -1); +ERROR: UDF ERROR: Invalid occurrence parameter to instr function: -1 +</code></pre> + + </li> + + <li class="li"> + <p class="p"> + If either of the optional arguments is <code class="ph codeph">NULL</code>, + the function also returns <code class="ph codeph">NULL</code>: + </p> + +<pre class="pre codeblock"><code> +select instr('foo bar bletch', 'b', null); ++------------------------------------+ +| instr('foo bar bletch', 'b', null) | ++------------------------------------+ +| NULL | ++------------------------------------+ + +select instr('foo bar bletch', 'b', 1, null); ++---------------------------------------+ +| instr('foo bar bletch', 'b', 1, null) | ++---------------------------------------+ +| NULL | ++---------------------------------------+ +</code></pre> + </li> + + </ul> + </dd> @@ -739,6 +913,71 @@ Returned 1 row(s) in 0.12s</code></pre> + <dt class="dt dlterm" id="string_functions__replace"> + <code class="ph codeph">replace(string initial, string target, string replacement)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the initial argument with all occurrences of the target string + replaced by the replacement string. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + Because this function does not use any regular expression patterns, it is typically faster + than <code class="ph codeph">regexp_replace()</code> for simple string substitutions. + </p> + <p class="p"> + If any argument is <code class="ph codeph">NULL</code>, the return value is <code class="ph codeph">NULL</code>. + </p> + <p class="p"> + Matching is case-sensitive. + </p> + <p class="p"> + If the replacement string contains another instance of the target + string, the expansion is only performed once, instead of + applying again to the newly constructed string. + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>-- Replace one string with another. +select replace('hello world','world','earth'); ++------------------------------------------+ +| replace('hello world', 'world', 'earth') | ++------------------------------------------+ +| hello earth | ++------------------------------------------+ + +-- All occurrences of the target string are replaced. +select replace('hello world','o','0'); ++----------------------------------+ +| replace('hello world', 'o', '0') | ++----------------------------------+ +| hell0 w0rld | ++----------------------------------+ + +-- If no match is found, the original string is returned unchanged. +select replace('hello world','xyz','abc'); ++--------------------------------------+ +| replace('hello world', 'xyz', 'abc') | ++--------------------------------------+ +| hello world | ++--------------------------------------+ +</code></pre> + </dd> + + + + + <dt class="dt dlterm" id="string_functions__reverse"> <code class="ph codeph">reverse(string a)</code> </dt> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_struct.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_struct.html b/docs/build/html/topics/impala_struct.html index c796fe9..c7b02d0 100644 --- a/docs/build/html/topics/impala_struct.html +++ b/docs/build/html/topics/impala_struct.html @@ -130,7 +130,7 @@ type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword </p> </li> <li class="li"> - <p class="p" id="struct__d6e2889"> + <p class="p" id="struct__d6e3003"> The maximum length of the column definition for any complex type, including declarations for any nested types, is 4000 characters. </p> @@ -147,7 +147,7 @@ type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword <strong class="ph b">Kudu considerations:</strong> </p> <p class="p"> - Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, + Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables. </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_timeouts.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_timeouts.html b/docs/build/html/topics/impala_timeouts.html index 2005c7d..b98ca0a 100644 --- a/docs/build/html/topics/impala_timeouts.html +++ b/docs/build/html/topics/impala_timeouts.html @@ -84,22 +84,34 @@ Trying to re-register with state-store</code></pre> <ul class="ul"> <li class="li"> - The <code class="ph codeph">--idle_query_timeout</code> option specifies the time in seconds after - which an idle query is cancelled. This could be a query whose results were all fetched - but was never closed, or one whose results were partially fetched and then the client - program stopped requesting further results. This condition is most likely to occur in - a client program using the JDBC or ODBC interfaces, rather than in the interactive - <span class="keyword cmdname">impala-shell</span> interpreter. Once the query is cancelled, the client - program cannot retrieve any further results. + <p class="p"> + The <code class="ph codeph">--idle_query_timeout</code> option specifies the time in seconds after + which an idle query is cancelled. This could be a query whose results were all fetched + but was never closed, or one whose results were partially fetched and then the client + program stopped requesting further results. This condition is most likely to occur in + a client program using the JDBC or ODBC interfaces, rather than in the interactive + <span class="keyword cmdname">impala-shell</span> interpreter. Once the query is cancelled, the client + program cannot retrieve any further results. + </p> + + <p class="p"> + You can reduce the idle query timeout by using the <code class="ph codeph">QUERY_TIMEOUT_S</code> + query option. Any non-zero value specified for the <code class="ph codeph">--idle_query_timeout</code> startup + option serves as an upper limit for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query option. + A zero value for <code class="ph codeph">--idle_query_timeout</code> disables query timeouts. + See <a class="xref" href="impala_query_timeout_s.html#query_timeout_s">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a> for details. + </p> </li> <li class="li"> - The <code class="ph codeph">--idle_session_timeout</code> option specifies the time in seconds after - which an idle session is expired. A session is idle when no activity is occurring for - any of the queries in that session, and the session has not started any new queries. - Once a session is expired, you cannot issue any new query requests to it. The session - remains open, but the only operation you can perform is to close it. The default value - of 0 means that sessions never expire. + <p class="p"> + The <code class="ph codeph">--idle_session_timeout</code> option specifies the time in seconds after + which an idle session is expired. A session is idle when no activity is occurring for + any of the queries in that session, and the session has not started any new queries. + Once a session is expired, you cannot issue any new query requests to it. The session + remains open, but the only operation you can perform is to close it. The default value + of 0 means that sessions never expire. + </p> </li> </ul> @@ -108,12 +120,14 @@ Trying to re-register with state-store</code></pre> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>. </p> - <p class="p"> - You can reduce the idle query timeout by using the <code class="ph codeph">QUERY_TIMEOUT_S</code> - query option. Any value specified for the <code class="ph codeph">--idle_query_timeout</code> startup - option serves as an upper limit for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query option. - See <a class="xref" href="impala_query_timeout_s.html#query_timeout_s">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a> for details. - </p> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + Impala checks periodically for idle sessions and queries + to cancel. The actual idle time before cancellation might be up to 50% greater than + the specified configuration setting. For example, if the timeout setting was 60, the + session or query might be cancelled after being idle between 60 and 90 seconds. + </p> + </div> </div> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_timestamp.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_timestamp.html b/docs/build/html/topics/impala_timestamp.html index 02f86fc..4a4f9fe 100644 --- a/docs/build/html/topics/impala_timestamp.html +++ b/docs/build/html/topics/impala_timestamp.html @@ -381,28 +381,6 @@ ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column </code></pre> <p class="p"> - <strong class="ph b">Examples:</strong> - </p> - -<pre class="pre codeblock"><code>select cast('1966-07-30' as timestamp); -select cast('1985-09-25 17:45:30.005' as timestamp); -select cast('08:30:00' as timestamp); -select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15. -select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required. -select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range. -select dayofweek('2004-06-13'); -- Returns 1, representing Sunday. -select dayname('2004-06-13'); -- Returns 'Sunday'. -select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields. -select day('2004-06-13'); -- Returns 13. -select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates? -select now(); -- Returns current date and time in local timezone. - -create table dates_and_times (t timestamp); -insert into dates_and_times values - ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now()); -</code></pre> - - <p class="p"> <strong class="ph b">NULL considerations:</strong> Casting any unrecognized <code class="ph codeph">STRING</code> value to this type produces a <code class="ph codeph">NULL</code> value. </p> @@ -480,12 +458,113 @@ insert into dates_and_times values <p class="p"> <strong class="ph b">Kudu considerations:</strong> </p> + <div class="p"> + In <span class="keyword">Impala 2.9</span> and higher, you can include <code class="ph codeph">TIMESTAMP</code> + columns in Kudu tables, instead of representing the date and time as a <code class="ph codeph">BIGINT</code> + value. The behavior of <code class="ph codeph">TIMESTAMP</code> for Kudu tables has some special considerations: + + <ul class="ul"> + <li class="li"> + <p class="p"> + Any nanoseconds in the original 96-bit value produced by Impala are not stored, because + Kudu represents date/time columns using 64-bit values. The nanosecond portion of the value + is rounded, not truncated. Therefore, a <code class="ph codeph">TIMESTAMP</code> value + that you store in a Kudu table might not be bit-for-bit identical to the value returned by a query. + </p> + </li> + <li class="li"> + <p class="p"> + The conversion between the Impala 96-bit representation and the Kudu 64-bit representation + introduces some performance overhead when reading or writing <code class="ph codeph">TIMESTAMP</code> + columns. You can minimize the overhead during writes by performing inserts through the + Kudu API. Because the overhead during reads applies to each query, you might continue to + use a <code class="ph codeph">BIGINT</code> column to represent date/time values in performance-critical + applications. + </p> + </li> + <li class="li"> + <p class="p"> + The Impala <code class="ph codeph">TIMESTAMP</code> type has a narrower range for years than the underlying + Kudu data type. Impala can represent years 1400-9999. If year values outside this range + are written to a Kudu table by a non-Impala client, Impala returns <code class="ph codeph">NULL</code> + by default when reading those <code class="ph codeph">TIMESTAMP</code> values during a query. Or, if the + <code class="ph codeph">ABORT_ON_ERROR</code> query option is enabled, the query fails when it encounters + a value with an out-of-range year. + </p> + </li> + </ul> + </div> + <p class="p"> - Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, - <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables. + <strong class="ph b">Examples:</strong> </p> <p class="p"> + The following examples demonstrate using <code class="ph codeph">TIMESTAMP</code> values + with built-in functions: + </p> + +<pre class="pre codeblock"><code>select cast('1966-07-30' as timestamp); +select cast('1985-09-25 17:45:30.005' as timestamp); +select cast('08:30:00' as timestamp); +select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15. +select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required. +select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range. +select dayofweek('2004-06-13'); -- Returns 1, representing Sunday. +select dayname('2004-06-13'); -- Returns 'Sunday'. +select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields. +select day('2004-06-13'); -- Returns 13. +select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates? +select now(); -- Returns current date and time in local timezone. +</code></pre> + + <p class="p"> + The following examples demonstrate using <code class="ph codeph">TIMESTAMP</code> values + with HDFS-backed tables: + </p> + +<pre class="pre codeblock"><code>create table dates_and_times (t timestamp); +insert into dates_and_times values + ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now()); +</code></pre> + + <p class="p"> + The following examples demonstrate using <code class="ph codeph">TIMESTAMP</code> values + with Kudu tables: + </p> + +<pre class="pre codeblock"><code>create table timestamp_t (x int primary key, s string, t timestamp, b bigint) + partition by hash (x) partitions 16 + stored as kudu; + +-- The default value of now() has microsecond precision, so the final 3 digits +-- representing nanoseconds are all zero. +insert into timestamp_t values (1, cast(now() as string), now(), unix_timestamp(now())); + +-- Values with 1-499 nanoseconds are rounded down in the Kudu TIMESTAMP column. +insert into timestamp_t values (2, cast(now() + interval 100 nanoseconds as string), now() + interval 100 nanoseconds, unix_timestamp(now() + interval 100 nanoseconds)); +insert into timestamp_t values (3, cast(now() + interval 499 nanoseconds as string), now() + interval 499 nanoseconds, unix_timestamp(now() + interval 499 nanoseconds)); + +-- Values with 500-999 nanoseconds are rounded up in the Kudu TIMESTAMP column. +insert into timestamp_t values (4, cast(now() + interval 500 nanoseconds as string), now() + interval 500 nanoseconds, unix_timestamp(now() + interval 500 nanoseconds)); +insert into timestamp_t values (5, cast(now() + interval 501 nanoseconds as string), now() + interval 501 nanoseconds, unix_timestamp(now() + interval 501 nanoseconds)); + +-- The string representation shows how underlying Impala TIMESTAMP can have nanosecond precision. +-- The TIMESTAMP column shows how timestamps in a Kudu table are rounded to microsecond precision. +-- The BIGINT column represents seconds past the epoch and so if not affected much by nanoseconds. +select s, t, b from timestamp_t order by t; ++-------------------------------+-------------------------------+------------+ +| s | t | b | ++-------------------------------+-------------------------------+------------+ +| 2017-05-31 15:30:05.107157000 | 2017-05-31 15:30:05.107157000 | 1496244605 | +| 2017-05-31 15:30:28.868151100 | 2017-05-31 15:30:28.868151000 | 1496244628 | +| 2017-05-31 15:34:33.674692499 | 2017-05-31 15:34:33.674692000 | 1496244873 | +| 2017-05-31 15:35:04.769166500 | 2017-05-31 15:35:04.769167000 | 1496244904 | +| 2017-05-31 15:35:33.033082501 | 2017-05-31 15:35:33.033083000 | 1496244933 | ++-------------------------------+-------------------------------+------------+ +</code></pre> + + <p class="p"> <strong class="ph b">Related information:</strong> </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_troubleshooting.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_troubleshooting.html b/docs/build/html/topics/impala_troubleshooting.html index 7728ee4..0462796 100644 --- a/docs/build/html/topics/impala_troubleshooting.html +++ b/docs/build/html/topics/impala_troubleshooting.html @@ -86,7 +86,7 @@ $ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0 vm.drop_caches = 3 vm.drop_caches = 0 -$ sudo dd if=/dev/sda bs=1M of=/dev/null count=1k +$ sudo dd if=/dev/sda bs=1M of=/dev/null count=1k 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 5.60373 s, 192 MB/s http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_udf.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_udf.html b/docs/build/html/topics/impala_udf.html index 3002c21..fb5687d 100644 --- a/docs/build/html/topics/impala_udf.html +++ b/docs/build/html/topics/impala_udf.html @@ -228,6 +228,24 @@ select most_profitable_location(store_id, sales, expenses, tax_rate, depreciatio </li> </ol> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <div class="p"> + In <span class="keyword">Impala 2.9</span> and higher, you can refresh the user-defined functions (UDFs) + that Impala recognizes, at the database level, by running the <code class="ph codeph">REFRESH FUNCTIONS</code> + statement with the database name as an argument. Java-based UDFs can be added to the metastore + database through Hive <code class="ph codeph">CREATE FUNCTION</code> statements, and made visible to Impala + by subsequently running <code class="ph codeph">REFRESH FUNCTIONS</code>. For example: + +<pre class="pre codeblock"><code>CREATE DATABASE shared_udfs; +USE shared_udfs; +...use CREATE FUNCTION statements in Hive to create some Java-based UDFs + that Impala is not initially aware of... +REFRESH FUNCTIONS shared_udfs; +SELECT udf_created_by_hive(c1) FROM ... +</code></pre> + </div> + </div> + <div class="example"><h4 class="title sectiontitle">Java UDF Example: Reusing lower() Function</h4> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/html/topics/impala_varchar.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_varchar.html b/docs/build/html/topics/impala_varchar.html index 99909e4..601456a 100644 --- a/docs/build/html/topics/impala_varchar.html +++ b/docs/build/html/topics/impala_varchar.html @@ -130,7 +130,7 @@ <strong class="ph b">Kudu considerations:</strong> </p> <p class="p"> - Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, + Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables. </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/impala-2.8.pdf ---------------------------------------------------------------------- diff --git a/docs/build/impala-2.8.pdf b/docs/build/impala-2.8.pdf new file mode 100644 index 0000000..b060692 Binary files /dev/null and b/docs/build/impala-2.8.pdf differ http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/impala-2.9.pdf ---------------------------------------------------------------------- diff --git a/docs/build/impala-2.9.pdf b/docs/build/impala-2.9.pdf new file mode 100644 index 0000000..54d3921 Binary files /dev/null and b/docs/build/impala-2.9.pdf differ http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/docs/build/impala.pdf ---------------------------------------------------------------------- diff --git a/docs/build/impala.pdf b/docs/build/impala.pdf deleted file mode 100644 index b060692..0000000 Binary files a/docs/build/impala.pdf and /dev/null differ http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ae2f8d03/impala-docs.html ---------------------------------------------------------------------- diff --git a/impala-docs.html b/impala-docs.html index 361500a..75cc4c5 100644 --- a/impala-docs.html +++ b/impala-docs.html @@ -145,16 +145,16 @@ Impala 2.9 <ul> <li><a href="docs/changelog-2.9.html">Change Log</a></li> - <li><p>The HTML and PDF documentation are not available yet.</p></li> - <ul> + <li><a href="docs/build/html/index.html">HTML Documentation for Impala 2.9</a></li> + <li><a href="docs/build/impala-2.9.pdf">PDF Documentation for Impala 2.9</a></li> + </ul> </p> </div> </div> <div class="row"> <div class="span12"><h3>Older Releases</h3> - <p><a href="docs/build/html/index.html">HTML Documentation for Impala 2.8</a></p> - <p><a href="docs/build/impala.pdf">PDF Documentation for Impala 2.8</a></p> + <p><a href="docs/build/impala-2.8.pdf">PDF Documentation for Impala 2.8</a></p> <p> <a href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/Impala/impala.html" >Documentation for previous versions of Impala in CDH</a>
