http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_mem_limit.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_mem_limit.xml b/docs/topics/impala_mem_limit.xml index b784989..b478a8f 100644 --- a/docs/topics/impala_mem_limit.xml +++ b/docs/topics/impala_mem_limit.xml @@ -61,7 +61,7 @@ under the License. </p> <p> - When resource management is enabled in CDH 5, the mechanism for this option changes. If set, it overrides the + When resource management is enabled, the mechanism for this option changes. If set, it overrides the automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the query does not proceed until that much memory is available. The actual memory used by the query could be lower, since some queries use much less memory than others. With resource management, the
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_new_features.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_new_features.xml b/docs/topics/impala_new_features.xml index 67a8eb8..9a3128d 100644 --- a/docs/topics/impala_new_features.xml +++ b/docs/topics/impala_new_features.xml @@ -2371,7 +2371,7 @@ under the License. <p audience="PDF"> For background information about HDFS caching, see - <xref keyref="cdh_ig_hdfs_caching"/>. For performance information about using this feature with Impala, see + <xref keyref="setup_hdfs_caching"/>. For performance information about using this feature with Impala, see <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>. For the <codeph>SET CACHED</codeph> and <codeph>SET UNCACHED</codeph> clauses that let you control cached table data through DDL statements, see <xref href="impala_create_table.xml#create_table"/> and http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_noncm_installation.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_noncm_installation.xml b/docs/topics/impala_noncm_installation.xml deleted file mode 100644 index d443b6e..0000000 --- a/docs/topics/impala_noncm_installation.xml +++ /dev/null @@ -1,190 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> -<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> -<concept id="noncm_installation"> - - <title>Installing Impala</title> - <prolog> - <metadata> - <data name="Category" value="Impala"/> - <data name="Category" value="Installing"/> - <data name="Category" value="Administrators"/> - </metadata> - </prolog> - - <conbody> - - <p> - Before installing Impala manually, make sure all applicable nodes have the appropriate hardware - configuration, levels of operating system and CDH, and any other software prerequisites. See - <xref href="impala_prereqs.xml#prereqs"/> for details. - </p> - - <p> - You can install Impala across many hosts or on one host: - </p> - - <ul> - <li> - Installing Impala across multiple machines creates a distributed configuration. For best performance, - install Impala on <b>all</b> DataNodes. - </li> - - <li> - Installing Impala on a single machine produces a pseudo-distributed cluster. - </li> - </ul> - - <p> - <b>To install Impala on a host:</b> - </p> - - <ol> - <li> - Install CDH as described in the Installation section of the -<!-- Original URL: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/CDH5-Installation-Guide.html --> - <xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/installation.html" scope="external" format="html">CDH - 5 Installation Guide</xref>. - </li> - - <li> - <p> - Install the Hive metastore somewhere in your cluster, as described in the Hive Installation topic in the -<!-- Original URL: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh_ig_hive_installation.html --> - <xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_installation.html" scope="external" format="html">CDH - 5 Installation Guide</xref>. As part of this process, you configure the Hive metastore to use an external - database as a metastore. Impala uses this same database for its own table metadata. You can choose either - a MySQL or PostgreSQL database as the metastore. The process for configuring each type of database is - described in the CDH Installation Guide). - </p> - <p> - Whenever practical, set up a Hive metastore service rather than connecting directly to the metastore - database. Make sure the - <filepath>/etc/impala/conf/hive-site.xml</filepath> file contains the following setting, substituting the - appropriate hostname for <varname>metastore_server_host</varname>: - </p> -<codeblock><property> -<name>hive.metastore.uris</name> -<value>thrift://<varname>metastore_server_host</varname>:9083</value> -</property> -<property> -<name>hive.metastore.client.socket.timeout</name> -<value>3600</value> -<description>MetaStore Client socket timeout in seconds</description> -</property></codeblock> - </li> - - <li> - (Optional) If you installed the full Hive component on any host, you can verify that the metastore is - configured properly by starting the Hive console and querying for the list of available tables. Once you - confirm that the console starts, exit the console to continue the installation: -<codeblock>$ hive -Hive history file=/tmp/root/hive_job_log_root_201207272011_678722950.txt -hive> show tables; -table1 -table2 -hive> quit; -$</codeblock> - </li> - - <li> - Confirm that your package management command is aware of the Impala repository settings, as described in - <xref href="impala_prereqs.xml#prereqs"/>. You - might need to download a repo or list file into a system directory underneath <filepath>/etc</filepath>. - </li> - - <li> - Use <b>one</b> of the following sets of commands to install the Impala package: - <p> - <b>For RHEL, Oracle Linux, or CentOS systems:</b> - </p> -<codeblock rev="1.2">$ sudo yum install impala # Binaries for daemons -$ sudo yum install impala-server # Service start/stop script -$ sudo yum install impala-state-store # Service start/stop script -$ sudo yum install impala-catalog # Service start/stop script -</codeblock> - <p> - <b>For SUSE systems:</b> - </p> -<codeblock rev="1.2">$ sudo zypper install impala # Binaries for daemons -$ sudo zypper install impala-server # Service start/stop script -$ sudo zypper install impala-state-store # Service start/stop script -$ sudo zypper install impala-catalog # Service start/stop script -</codeblock> - <p> - <b>For Debian or Ubuntu systems:</b> - </p> -<codeblock rev="1.2">$ sudo apt-get install impala # Binaries for daemons -$ sudo apt-get install impala-server # Service start/stop script -$ sudo apt-get install impala-state-store # Service start/stop script -$ sudo apt-get install impala-catalog # Service start/stop script -</codeblock> - <note> - <ph rev="upstream">Cloudera</ph> recommends that you not install Impala on any HDFS NameNode. Installing Impala on NameNodes - provides no additional data locality, and executing queries with such a configuration might cause memory - contention and negatively impact the HDFS NameNode. - </note> - </li> - - <li> - Copy the client <codeph>hive-site.xml</codeph>, <codeph>core-site.xml</codeph>, - <codeph>hdfs-site.xml</codeph>, and <codeph>hbase-site.xml</codeph> configuration files to the Impala - configuration directory, which defaults to <codeph>/etc/impala/conf</codeph>. Create this directory if it - does not already exist. - </li> - - <li> - Use <b>one</b> of the following commands to install <codeph>impala-shell</codeph> on the machines from - which you want to issue queries. You can install <codeph>impala-shell</codeph> on any supported machine - that can connect to DataNodes that are running <codeph>impalad</codeph>. - <p> - <b>For RHEL/CentOS systems:</b> - </p> -<codeblock>$ sudo yum install impala-shell</codeblock> - <p> - <b>For SUSE systems:</b> - </p> -<codeblock>$ sudo zypper install impala-shell</codeblock> - <p> - <b>For Debian/Ubuntu systems:</b> - </p> -<codeblock>$ sudo apt-get install impala-shell</codeblock> - </li> - - <li> - Complete any required or recommended configuration, as described in - <xref href="impala_config_performance.xml#config_performance"/>. Some of these configuration changes are - mandatory. - </li> - - </ol> - - <p> - Once installation and configuration are complete, see <xref href="impala_processes.xml#processes"/> for how - to activate the software on the appropriate nodes in your cluster. - </p> - - <p> - If this is your first time setting up and using Impala in this cluster, run through some of the exercises in - <xref href="impala_tutorial.xml#tutorial"/> to verify that you can do basic operations such as creating - tables and querying them. - </p> - </conbody> -</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_parquet.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_parquet.xml b/docs/topics/impala_parquet.xml index 81fdb69..1809342 100644 --- a/docs/topics/impala_parquet.xml +++ b/docs/topics/impala_parquet.xml @@ -85,11 +85,6 @@ under the License. <p outputclass="toc inpage"/> - <p audience="integrated"> - For general information about using Parquet with other CDH components, - see <xref href="cdh_ig_parquet.xml#parquet_format"/>. - </p> - </conbody> @@ -714,7 +709,7 @@ Returned 1 row(s) in 13.35s <conbody> <p> - You can read and write Parquet data files from other CDH components. + You can read and write Parquet data files from other <keyword keyref="distro"/> components. <ph audience="integrated">See <xref href="cdh_ig_parquet.xml#parquet_format"/> for details.</ph> </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_partitioning.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_partitioning.xml b/docs/topics/impala_partitioning.xml index b59c243..1729530 100644 --- a/docs/topics/impala_partitioning.xml +++ b/docs/topics/impala_partitioning.xml @@ -391,7 +391,7 @@ insert into weather <b>partition (year=2014, month=04, day)</b> select 'sunny',2 <p> The original mechanism uses to prune partitions is <term>static partition pruning</term>, in which the conditions in the - <codeph>WHERE</codeph> clause are analyzed to determine in advance which partitions can be safely skipped. In Impala 2.5 / CDH 5.7 + <codeph>WHERE</codeph> clause are analyzed to determine in advance which partitions can be safely skipped. In <keyword keyref="impala25_full"/> and higher, Impala can perform <term>dynamic partition pruning</term>, where information about the partitions is collected during the query, and Impala prunes unnecessary partitions in ways that were impractical to predict in advance. </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_perf_cookbook.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_perf_cookbook.xml b/docs/topics/impala_perf_cookbook.xml index d2582d2..d0e40af 100644 --- a/docs/topics/impala_perf_cookbook.xml +++ b/docs/topics/impala_perf_cookbook.xml @@ -40,7 +40,7 @@ under the License. <p> Here are performance guidelines and best practices that you can use during planning, experimentation, and - performance tuning for an Impala-enabled CDH cluster. All of this information is also available in more + performance tuning for an Impala-enabled <keyword keyref="distro"/> cluster. All of this information is also available in more detail elsewhere in the Impala documentation; it is gathered together here to serve as a cookbook and emphasize which performance techniques typically provide the highest return on investment </p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_perf_hdfs_caching.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_perf_hdfs_caching.xml b/docs/topics/impala_perf_hdfs_caching.xml index 4337842..e1f25a6 100644 --- a/docs/topics/impala_perf_hdfs_caching.xml +++ b/docs/topics/impala_perf_hdfs_caching.xml @@ -80,8 +80,8 @@ under the License. <!-- Could conref this background link; haven't decided yet the best place or if it's needed twice. --> <p> - For background information about how to set up and manage HDFS caching for a CDH cluster, see - <xref keyref="cdh_ig_hdfs_caching"/>. + For background information about how to set up and manage HDFS caching for a <keyword keyref="distro"/> cluster, see + <xref keyref="setup_hdfs_caching"/>. </p> </conbody> @@ -97,7 +97,7 @@ under the License. <conbody> <p> - On <ph rev="upstream">CDH 5.1</ph> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that + In <keyword keyref="impala14_full"/> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that repeated queries can take advantage of data <q>pinned</q> in memory regardless of how much data is processed overall. The HDFS caching feature lets you designate a subset of frequently accessed data to be pinned permanently in memory, remaining in the cache across multiple queries and never being evicted. This @@ -126,7 +126,7 @@ under the License. <conbody> <p> - To use HDFS caching with Impala, first set up that feature for your CDH cluster: + To use HDFS caching with Impala, first set up that feature for your <keyword keyref="distro"/> cluster: </p> <ul> @@ -148,7 +148,7 @@ under the License. <codeblock>hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000 </codeblock> For details about the <cmdname>hdfs cacheadmin</cmdname> command, see - <xref keyref="cdh_ig_hdfs_caching"/>. + <xref keyref="setup_hdfs_caching"/>. </p> </li> </ul> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_perf_stats.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_perf_stats.xml b/docs/topics/impala_perf_stats.xml index 54b7c72..21564ae 100644 --- a/docs/topics/impala_perf_stats.xml +++ b/docs/topics/impala_perf_stats.xml @@ -550,7 +550,7 @@ show column stats year_month_day; <p rev="2.0.1"> <!-- Additional info as a result of IMPALA-1420 --> -<!-- Keep checking if https://issues.apache.org/jira/browse/HIVE-8648 ever gets fixed and when that fix makes it into a CDH release. --> +<!-- Keep checking if https://issues.apache.org/jira/browse/HIVE-8648 ever gets fixed and when that fix makes it into an Impala release. --> For your very largest tables, you might find that <codeph>COMPUTE STATS</codeph> or even <codeph>COMPUTE INCREMENTAL STATS</codeph> take so long to scan the data that it is impractical to use them regularly. In such a case, after adding a partition or inserting new data, you can update just the number of rows property through an <codeph>ALTER TABLE</codeph> statement. http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_proxy.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_proxy.xml b/docs/topics/impala_proxy.xml index d8823a8..e45ae72 100644 --- a/docs/topics/impala_proxy.xml +++ b/docs/topics/impala_proxy.xml @@ -111,7 +111,7 @@ under the License. </li> <li> <p rev="DOCS-690"> - Consider enabling <q>sticky sessions</q>. <ph rev="upstream">Cloudera</ph> recommends enabling this setting + Consider enabling <q>sticky sessions</q>. Where practical, enable this setting so that stateless client applications such as <cmdname>impalad</cmdname> and Hue are not disconnected from long-running queries. Evaluate whether this setting is appropriate for your combination of workload and client applications. http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_resource_management.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_resource_management.xml b/docs/topics/impala_resource_management.xml index b4174e3..73eaf1e 100644 --- a/docs/topics/impala_resource_management.xml +++ b/docs/topics/impala_resource_management.xml @@ -79,21 +79,21 @@ under the License. <conbody> <p> - To enable resource management for Impala, first you <xref href="#rm_cdh_prereqs">set up the YARN - service for your CDH cluster</xref>. Then you <xref href="#rm_options">add startup options and customize + To enable resource management for Impala, first you <xref href="#rm_prereqs">set up the YARN + service for your cluster</xref>. Then you <xref href="#rm_options">add startup options and customize resource management settings</xref> for the Impala services. </p> </conbody> - <concept id="rm_cdh_prereqs"> + <concept id="rm_prereqs"> - <title>Required CDH Setup for Resource Management with Impala</title> + <title>Required Setup for Resource Management with Impala</title> <conbody> <p> - YARN is the general-purpose service that manages resources for many Hadoop components within a CDH - cluster. + YARN is the general-purpose service that manages resources for many Hadoop components within a + <keyword keyref="distro"/> cluster. </p> </conbody> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_timeouts.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_timeouts.xml b/docs/topics/impala_timeouts.xml index 63053a0..5b843d4 100644 --- a/docs/topics/impala_timeouts.xml +++ b/docs/topics/impala_timeouts.xml @@ -40,7 +40,7 @@ under the License. <conbody> <p> - Depending on how busy your CDH cluster is, you might increase or decrease various timeout + Depending on how busy your <keyword keyref="distro"/> cluster is, you might increase or decrease various timeout values. Increase timeouts if Impala is cancelling operations prematurely, when the system is responding slower than usual but the operations are still successful if given extra time. Decrease timeouts if operations are idle or hanging for long periods, and the idle http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_troubleshooting.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_troubleshooting.xml b/docs/topics/impala_troubleshooting.xml index 0dd499f..99367e6 100644 --- a/docs/topics/impala_troubleshooting.xml +++ b/docs/topics/impala_troubleshooting.xml @@ -60,7 +60,7 @@ under the License. <ul> <li> If a query fails against both Impala and Hive, it is likely that there is a problem with your query or - other elements of your CDH environment: + other elements of your <keyword keyref="distro"/> environment: <ul> <li> Review the <xref href="impala_langref.xml#langref">Language Reference</xref> to ensure your query is http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff5d1ceb/docs/topics/impala_tutorial.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_tutorial.xml b/docs/topics/impala_tutorial.xml index e99abea..01e53e2 100644 --- a/docs/topics/impala_tutorial.xml +++ b/docs/topics/impala_tutorial.xml @@ -119,7 +119,7 @@ under the License. on their names. </p> -<codeblock rev="upstream">$ impala-shell -i localhost --quiet +<codeblock>$ impala-shell -i localhost --quiet Starting Impala Shell without Kerberos authentication Welcome to the Impala shell. Press TAB twice to see a list of available commands. ... @@ -1671,7 +1671,7 @@ the <codeph>impala</codeph> user can read the files, which will be sufficient fo queries and perform some copy and transform operations into other tables. </p> -<codeblock rev="upstream">$ impala-shell -i localhost +<codeblock>$ impala-shell -i localhost Starting Impala Shell without Kerberos authentication Connected to localhost:21000 @@ -2092,7 +2092,7 @@ to start with, we restart the <cmdname>impala-shell</cmdname> command with the <codeph>-B</codeph> option, which turns off the box-drawing behavior. </p> -<codeblock rev="upstream">[localhost:21000] > quit; +<codeblock>[localhost:21000] > quit; Goodbye jrussell $ impala-shell -i localhost -B -d airline_data; Starting Impala Shell without Kerberos authentication
