[44/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

jbapple Wed, 12 Apr 2017 11:25:38 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_breakpad.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_breakpad.html 
b/docs/build/html/topics/impala_breakpad.html
new file mode 100644
index 0000000..7e05497
--- /dev/null
+++ b/docs/build/html/topics/impala_breakpad.html
@@ -0,0 +1,223 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_troubleshooting.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="breakpad"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Breakpad Minidumps for Impala (Impala 2.6 or 
higher only)</title></head><body id="breakpad"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Breakpad Minidumps for 
Impala (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <a class="xref" 
href="https://chromium.googlesource.com/breakpad/breakpad/"; 
target="_blank">breakpad</a>
+      project is an open-source framework for crash reporting.
+      In <span class="keyword">Impala 2.6</span> and higher, Impala can use 
<code class="ph codeph">breakpad</code> to record stack information and
+      register values when any of the Impala-related daemons crash due to an 
error such as <code class="ph codeph">SIGSEGV</code>
+      or unhandled exceptions.
+      The dump files are much smaller than traditional core dump files. The 
dump mechanism itself uses very little
+      memory, which improves reliability if the crash occurs while the system 
is low on memory.
+    </p>
+
+    <div class="note important note_important"><span class="note__title 
importanttitle">Important:</span> 
+      Because of the internal mechanisms involving Impala memory allocation 
and Linux
+      signalling for out-of-memory (OOM) errors, if an Impala-related daemon 
experiences a
+      crash due to an OOM condition, it does <em class="ph i">not</em> 
generate a minidump for that error.
+    <p class="p">
+
+    </p>
+    </div>
+
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_troubleshooting.html">Troubleshooting 
Impala</a></div></div></nav><article class="topic concept nested1" 
aria-labelledby="ariaid-title2" id="breakpad__breakpad_minidump_enable">
+    <h2 class="title topictitle2" id="ariaid-title2">Enabling or Disabling 
Minidump Generation</h2>
+    <div class="body conbody">
+      <p class="p">
+        By default, a minidump file is generated when an Impala-related daemon 
crashes.
+        To turn off generation of the minidump files, change the
+        <span class="ph uicontrol">minidump_path</span> configuration setting 
of one or more Impala-related daemons
+        to the empty string, and restart the corresponding services or daemons.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.7</span> and higher,
+        you can send a <code class="ph codeph">SIGUSR1</code> signal to any 
Impala-related daemon to write a
+        Breakpad minidump. For advanced troubleshooting, you can now produce a 
minidump
+        without triggering a crash.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="breakpad__breakpad_minidump_location">
+    <h2 class="title topictitle2" id="ariaid-title3">Specifying the Location 
for Minidump Files</h2>
+    <div class="body conbody">
+      <div class="p">
+        By default, all minidump files are written to the following location
+        on the host where a crash occurs:
+        
+         <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Clusters not managed by cluster management software:
+              <span class="ph filepath"><var class="keyword 
varname">impala_log_dir</var>/<var class="keyword 
varname">daemon_name</var>/minidumps/<var class="keyword 
varname">daemon_name</var></span>
+            </p>
+          </li>
+        </ul>
+        The minidump files for <span class="keyword cmdname">impalad</span>, 
<span class="keyword cmdname">catalogd</span>,
+        and <span class="keyword cmdname">statestored</span> are each written 
to a separate directory.
+      </div>
+      <p class="p">
+        To specify a different location, set the
+        
+        <span class="ph uicontrol">minidump_path</span>
+        configuration setting of one or more Impala-related daemons, and 
restart the corresponding services or daemons.
+      </p>
+      <p class="p">
+        If you specify a relative path for this setting, the value is 
interpreted relative to
+        the default <span class="ph uicontrol">minidump_path</span> directory.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" 
id="breakpad__breakpad_minidump_number">
+    <h2 class="title topictitle2" id="ariaid-title4">Controlling the Number of 
Minidump Files</h2>
+    <div class="body conbody">
+      <p class="p">
+        Like any files used for logging or troubleshooting, consider limiting 
the number of
+        minidump files, or removing unneeded ones, depending on the amount of 
free storage
+        space on the hosts in the cluster.
+      </p>
+      <p class="p">
+        Because the minidump files are only used for problem resolution, you 
can remove any such files that
+        are not needed to debug current issues.
+      </p>
+      <p class="p">
+        To control how many minidump files Impala keeps around at any one time,
+        set the <span class="ph uicontrol">max_minidumps</span> configuration 
setting for
+        of one or more Impala-related daemon, and restart the corresponding 
services or daemons.
+        The default for this setting is 9. A zero or negative value is 
interpreted as
+        <span class="q">"unlimited"</span>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" 
id="breakpad__breakpad_minidump_logging">
+    <h2 class="title topictitle2" id="ariaid-title5">Detecting Crash 
Events</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        You can see in the Impala log files when crash events occur that 
generate
+        minidump files. Because each restart begins a new log file, the <span 
class="q">"crashed"</span> message
+        is always at or near the bottom of the log file. There might be 
another later message
+        if core dumps are also enabled.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" 
id="breakpad__breakpad_demo">
+    <h2 class="title topictitle2" id="ariaid-title6">Demonstration of Breakpad 
Feature</h2>
+    <div class="body conbody">
+      <p class="p">
+        The following example uses the command <span class="keyword 
cmdname">kill -11</span> to
+        simulate a <code class="ph codeph">SIGSEGV</code> crash for an <span 
class="keyword cmdname">impalad</span>
+        process on a single DataNode, then examines the relevant log files and 
minidump file.
+      </p>
+
+      <p class="p">
+        First, as root on a worker node, kill the <span class="keyword 
cmdname">impalad</span> process with a
+        <code class="ph codeph">SIGSEGV</code> error. The original process ID 
was 23114.
+      </p>
+
+<pre class="pre codeblock"><code>
+# ps ax | grep impalad
+23114 ?        Sl     0:18 
/opt/local/parcels/&lt;parcel_version&gt;/lib/impala/sbin/impalad 
--flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31259 pts/0    S+     0:00 grep impalad
+#
+# kill -11 23114
+#
+# ps ax | grep impalad
+31374 ?        Rl     0:04 
/opt/local/parcels/&lt;parcel_version&gt;/lib/impala/sbin/impalad 
--flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31475 pts/0    S+     0:00 grep impalad
+
+</code></pre>
+
+      <p class="p">
+        We locate the log directory underneath <span class="ph 
filepath">/var/log</span>.
+        There is a <code class="ph codeph">.INFO</code>, <code class="ph 
codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code>
+        log file for the 23114 process ID. The minidump message is written to 
the
+        <code class="ph codeph">.INFO</code> file and the <code class="ph 
codeph">.ERROR</code> file, but not the
+        <code class="ph codeph">.WARNING</code> file. In this case, a large 
core file was also produced.
+      </p>
+<pre class="pre codeblock"><code>
+# cd /var/log/impalad
+# ls -la | grep 23114
+-rw-------   1 impala impala 3539079168 Jun 23 15:20 core.23114
+-rw-r--r--   1 impala impala      99057 Jun 23 15:20 hs_err_pid23114.log
+-rw-r--r--   1 impala impala        351 Jun 23 15:20 
impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+-rw-r--r--   1 impala impala      29101 Jun 23 15:20 
impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+-rw-r--r--   1 impala impala        228 Jun 23 14:03 
impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114
+
+</code></pre>
+      <p class="p">
+        The <code class="ph codeph">.INFO</code> log includes the location of 
the minidump file, followed by
+        a report of a core dump. With the breakpad minidump feature enabled, 
now we might
+        disable core dumps or keep fewer of them around.
+      </p>
+<pre class="pre codeblock"><code>
+# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+...
+Wrote minidump to 
/var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+#
+# A fatal error has been detected by the Java Runtime Environment:
+#
+#  SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
+#
+# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 
1.7.0_67-b01)
+# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 
compressed oops)
+# Problematic frame:
+# C  [libpthread.so.0+0xb68a]  pthread_cond_wait+0xca
+#
+# Core dump written. Default location: /var/log/impalad/core or core.23114
+#
+# An error report file with more information is saved as:
+# /var/log/impalad/hs_err_pid23114.log
+#
+# If you would like to submit a bug report, please visit:
+#   http://bugreport.sun.com/bugreport/crash.jsp
+# The crash happened outside the Java Virtual Machine in native code.
+# See problematic frame for where to report the bug.
+...
+
+# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+
+Log file created at: 2016/06/23 14:03:43
+Running on machine:.worker_node_123
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
+Wrote minidump to 
/var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+
+      <p class="p">
+        The resulting minidump file is much smaller than the corresponding 
core file,
+        making it much easier to supply diagnostic information to <span 
class="keyword">the appropriate support channel</span>.
+      </p>
+
+<pre class="pre codeblock"><code>
+# pwd
+/var/log/impalad
+# cd ../impala-minidumps/impalad
+# ls
+0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+# du -kh *
+2.4M  0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_char.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_char.html 
b/docs/build/html/topics/impala_char.html
new file mode 100644
index 0000000..e0b4cb9
--- /dev/null
+++ b/docs/build/html/topics/impala_char.html
@@ -0,0 +1,305 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_datatypes.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="char"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>CHAR Data Type (Impala 2.0 or higher 
only)</title></head><body id="char"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CHAR Data Type (<span 
class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      A fixed-length character type, padded with trailing spaces if necessary 
to achieve the specified length. If
+      values are longer than the specified length, Impala truncates any 
trailing characters.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE 
TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword 
varname">column_name</var> CHAR(<var class="keyword 
varname">length</var>)</code></pre>
+
+    <p class="p">
+      The maximum length you can specify is 255.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Semantics of trailing spaces:</strong>
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        When you store a <code class="ph codeph">CHAR</code> value shorter 
than the specified length in a table, queries return
+        the value padded with trailing spaces if necessary; the resulting 
value has the same length as specified in
+        the column definition.
+      </li>
+
+      <li class="li">
+        If you store a <code class="ph codeph">CHAR</code> value containing 
trailing spaces in a table, those trailing spaces are
+        not stored in the data file. When the value is retrieved by a query, 
the result could have a different
+        number of trailing spaces. That is, the value includes however many 
spaces are needed to pad it to the
+        specified length of the column.
+      </li>
+
+      <li class="li">
+        If you compare two <code class="ph codeph">CHAR</code> values that 
differ only in the number of trailing spaces, those
+        values are considered identical.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> This type can be used for 
partition key columns. Because of the efficiency advantage
+        of numeric values over character-based values, if the partition key is 
a string representation of a number,
+        prefer to use an integer type with sufficient range (<code class="ph 
codeph">INT</code>, <code class="ph codeph">BIGINT</code>, and so
+        on) where practical.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type 
cannot be used with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        This type can be read from and written to Parquet files.
+      </li>
+
+      <li class="li">
+        There is no requirement for a particular level of Parquet.
+      </li>
+
+      <li class="li">
+        Parquet files generated by Impala and containing this type can be 
freely interchanged with other components
+        such as Hive and MapReduce.
+      </li>
+
+      <li class="li">
+        Any trailing spaces, whether implicitly or explicitly specified, are 
not written to the Parquet data files.
+      </li>
+
+      <li class="li">
+        Parquet data files might contain values that are longer than allowed 
by the
+        <code class="ph codeph">CHAR(<var class="keyword 
varname">n</var>)</code> length limit. Impala ignores any extra trailing 
characters when
+        it processes those values during a query.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong>
+      </p>
+
+    <p class="p">
+      Text data files might contain values that are longer than allowed for a 
particular
+      <code class="ph codeph">CHAR(<var class="keyword 
varname">n</var>)</code> column. Any extra trailing characters are ignored when 
Impala
+      processes those values during a query. Text data files can also contain 
values that are shorter than the
+      defined length limit, and Impala pads them with trailing spaces up to 
the specified length. Any text data
+      files produced by Impala <code class="ph codeph">INSERT</code> 
statements do not include any trailing blanks for
+      <code class="ph codeph">CHAR</code> columns.
+    </p>
+
+    <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+    <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in 
length.
+        Impala queries for Avro tables use 32-bit integers to hold string 
lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala 
truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to 
(2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value 
longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      This type is available using <span class="keyword">Impala 2.0</span> or 
higher.
+    </p>
+
+    <p class="p">
+      Some other database systems make the length specification optional. For 
Impala, the length is required.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory 
as a byte array with the same size as the length
+        specification. Values that are shorter than the specified length are 
padded on the right with trailing
+        spaces.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 
2.0.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> 
Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run 
the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">UDF considerations:</strong> This type cannot be 
used for the argument or return type of a user-defined
+        function (UDF) or user-defined aggregate function (UDA).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples show how trailing spaces are not considered significant 
when comparing or processing
+      <code class="ph codeph">CHAR</code> values. <code class="ph 
codeph">CAST()</code> truncates any longer string to fit within the defined
+      length. If a <code class="ph codeph">CHAR</code> value is shorter than 
the specified length, it is padded on the right with
+      spaces until it matches the specified length. Therefore, <code class="ph 
codeph">LENGTH()</code> represents the length
+      including any trailing spaces, and <code class="ph 
codeph">CONCAT()</code> also treats the column value as if it has
+      trailing spaces.
+    </p>
+
+<pre class="pre codeblock"><code>select cast('x' as char(4)) = cast('x   ' as 
char(4)) as "unpadded equal to padded";
++--------------------------+
+| unpadded equal to padded |
++--------------------------+
+| true                     |
++--------------------------+
+
+create table char_length(c char(3));
+insert into char_length values (cast('1' as char(3))), (cast('12' as 
char(3))), (cast('123' as char(3))), (cast('123456' as char(3)));
+select concat("[",c,"]") as c, length(c) from char_length;
++-------+-----------+
+| c     | length(c) |
++-------+-----------+
+| [1  ] | 3         |
+| [12 ] | 3         |
+| [123] | 3         |
+| [123] | 3         |
++-------+-----------+
+</code></pre>
+
+    <p class="p">
+      This example shows a case where data values are known to have a specific 
length, where <code class="ph codeph">CHAR</code>
+      is a logical data type to use.
+
+    </p>
+
+<pre class="pre codeblock"><code>create table addresses
+  (id bigint,
+   street_name string,
+   state_abbreviation char(2),
+   country_abbreviation char(2));
+</code></pre>
+
+    <p class="p">
+      The following example shows how values written by Impala do not 
physically include the trailing spaces. It
+      creates a table using text format, with <code class="ph 
codeph">CHAR</code> values much shorter than the declared length,
+      and then prints the resulting data file to show that the delimited 
values are not separated by spaces. The
+      same behavior applies to binary-format Parquet data files.
+    </p>
+
+<pre class="pre codeblock"><code>create table char_in_text (a char(20), b 
char(30), c char(40))
+  row format delimited fields terminated by ',';
+
+insert into char_in_text values (cast('foo' as char(20)), cast('bar' as 
char(30)), cast('baz' as char(40))), (cast('hello' as char(20)), cast('goodbye' 
as char(30)), cast('aloha' as char(40)));
+
+-- Running this Linux command inside impala-shell using the ! shortcut.
+!hdfs dfs -cat 
'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+foo,bar,baz
+hello,goodbye,aloha
+</code></pre>
+
+    <p class="p">
+      The following example further illustrates the treatment of spaces. It 
replaces the contents of the previous
+      table with some values including leading spaces, trailing spaces, or 
both. Any leading spaces are preserved
+      within the data file, but trailing spaces are discarded. Then when the 
values are retrieved by a query, the
+      leading spaces are retrieved verbatim while any necessary trailing 
spaces are supplied by Impala.
+    </p>
+
+<pre class="pre codeblock"><code>insert overwrite char_in_text values 
(cast('trailing   ' as char(20)), cast('   leading and trailing   ' as 
char(30)), cast('   leading' as char(40)));
+!hdfs dfs -cat 
'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+trailing,   leading and trailing,   leading
+
+select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c 
from char_in_text;
++------------------------+----------------------------------+--------------------------------------------+
+| a                      | b                                | c                
                          |
++------------------------+----------------------------------+--------------------------------------------+
+| [trailing            ] | [   leading and trailing       ] | [   leading      
                        ] |
++------------------------+----------------------------------+--------------------------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, 
<code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, 
<code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph 
codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used 
with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      Because the blank-padding behavior requires allocating the maximum 
length for each value in memory, for
+      scalability reasons avoid declaring <code class="ph codeph">CHAR</code> 
columns that are much longer than typical values in
+      that column.
+    </p>
+
+    <p class="p">
+        All data in <code class="ph codeph">CHAR</code> and <code class="ph 
codeph">VARCHAR</code> columns must be in a character encoding that
+        is compatible with UTF-8. If you have binary data from another 
database system (that is, a BLOB type), use
+        a <code class="ph codeph">STRING</code> column to hold it.
+      </p>
+
+    <p class="p">
+      When an expression compares a <code class="ph codeph">CHAR</code> with a 
<code class="ph codeph">STRING</code> or
+      <code class="ph codeph">VARCHAR</code>, the <code class="ph 
codeph">CHAR</code> value is implicitly converted to <code class="ph 
codeph">STRING</code>
+      first, with trailing spaces preserved.
+    </p>
+
+<pre class="pre codeblock"><code>select cast("foo  " as char(5)) = 'foo' as 
"char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| false                |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      This behavior differs from other popular database systems. To get the 
expected result of
+      <code class="ph codeph">TRUE</code>, cast the expressions on both sides 
to <code class="ph codeph">CHAR</code> values of the appropriate
+      length:
+    </p>
+
+<pre class="pre codeblock"><code>select cast("foo  " as char(5)) = cast('foo' 
as char(3)) as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| true                 |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      This behavior is subject to change in future releases.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_string.html#string">STRING Data Type</a>, 
<a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 
2.0 or higher only)</a>,
+      <a class="xref" href="impala_literals.html#string_literals">String 
Literals</a>,
+      <a class="xref" 
href="impala_string_functions.html#string_functions">Impala String Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_datatypes.html">Data 
Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_cluster_sizing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_cluster_sizing.html 
b/docs/build/html/topics/impala_cluster_sizing.html
new file mode 100644
index 0000000..d1f2a51
--- /dev/null
+++ b/docs/build/html/topics/impala_cluster_sizing.html
@@ -0,0 +1,318 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_planning.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="cluster_sizing"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Cluster Sizing Guidelines for 
Impala</title></head><body id="cluster_sizing"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Cluster Sizing Guidelines 
for Impala</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      This document provides a very rough guideline to estimate the size of a 
cluster needed for a specific
+      customer application. You can use this information when planning how 
much and what type of hardware to
+      acquire for a new cluster, or when adding Impala workloads to an 
existing cluster.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      Before making purchase or deployment decisions, consult organizations 
with relevant experience
+      to verify the conclusions about hardware requirements based on your data 
volume and workload.
+    </div>
+
+
+
+    <p class="p">
+      Always use hosts with identical specifications and capacities for all 
the nodes in the cluster. Currently,
+      Impala divides the work evenly between cluster nodes, regardless of 
their exact hardware configuration.
+      Because work can be distributed in different ways for different queries, 
if some hosts are overloaded
+      compared to others in terms of CPU, memory, I/O, or network, you might 
experience inconsistent performance
+      and overall slowness
+    </p>
+
+    <p class="p">
+      For analytic workloads with star/snowflake schemas, and using consistent 
hardware for all nodes (64 GB RAM,
+      12 2 TB hard drives, 2x E5-2630L 12 cores total, 10 GB network), the 
following table estimates the number of
+      DataNodes needed in the cluster based on data size and the number of 
concurrent queries, for workloads
+      similar to TPC-DS benchmark queries:
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. 
</span><span class="title">Cluster size estimation based on the number of 
concurrent queries and data size with a 20 second average query response 
time</span></caption><colgroup><col><col><col><col><col><col></colgroup><thead 
class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__1">
+              Data Size
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__2">
+              1 query
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__3">
+              10 queries
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__4">
+              100 queries
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__5">
+              1000 queries
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__6">
+              2000 queries
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">250 GB</strong>
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__3 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__4 ">
+              5
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__5 ">
+              35
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__6 ">
+              70
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">500 GB</strong>
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__3 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__4 ">
+              10
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__5 ">
+              70
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__6 ">
+              135
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">1 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__3 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__4 ">
+              15
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__5 ">
+              135
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__6 ">
+              270
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">15 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__3 ">
+              20
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__4 ">
+              200
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__5 ">
+              N/A
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__6 ">
+              N/A
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">30 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__2 ">
+              4
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__3 ">
+              40
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__4 ">
+              400
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__5 ">
+              N/A
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__6 ">
+              N/A
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">60 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__2 ">
+              8
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__3 ">
+              80
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__4 ">
+              800
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__5 ">
+              N/A
+            </td>
+            <td class="entry nocellnorowborder" 
headers="cluster_sizing__entry__6 ">
+              N/A
+            </td>
+          </tr>
+        </tbody></table>
+
+    <section class="section" id="cluster_sizing__sizing_factors"><h2 
class="title sectiontitle">Factors Affecting Scalability</h2>
+
+      
+
+      <p class="p">
+        A typical analytic workload (TPC-DS style queries) using recommended 
hardware is usually CPU-bound. Each
+        node can process roughly 1.6 GB/sec. Both CPU-bound and disk-bound 
workloads can scale almost linearly with
+        cluster size. However, for some workloads, the scalability might be 
bounded by the network, or even by
+        memory.
+      </p>
+
+      <p class="p">
+        If the workload is already network bound (on a 10 GB network), 
increasing the cluster size wonât reduce
+        the network load; in fact, a larger cluster could increase network 
traffic because some queries involve
+        <span class="q">"broadcast"</span> operations to all DataNodes. 
Therefore, boosting the cluster size does not improve query
+        throughput in a network-constrained environment.
+      </p>
+
+      <p class="p">
+        Letâs look at a memory-bound workload. A workload is memory-bound if 
Impala cannot run any additional
+        concurrent queries because all memory allocated has already been 
consumed, but neither CPU, disk, nor
+        network is saturated yet. This can happen because currently Impala 
uses only a single core per node to
+        process join and aggregation queries. For a node with 128 GB of RAM, 
if a join node takes 50 GB, the system
+        cannot run more than 2 such queries at the same time.
+      </p>
+
+      <p class="p">
+        Therefore, at most 2 cores are used. Throughput can still scale almost 
linearly even for a memory-bound
+        workload. Itâs just that the CPU will not be saturated. Per-node 
throughput will be lower than 1.6
+        GB/sec. Consider increasing the memory per node.
+      </p>
+
+      <p class="p">
+        As long as the workload is not network- or memory-bound, we can use 
the 1.6 GB/second per node as the
+        throughput estimate.
+      </p>
+    </section>
+
+    <section class="section" id="cluster_sizing__sizing_details"><h2 
class="title sectiontitle">A More Precise Approach</h2>
+
+      
+
+      <p class="p">
+        A more precise sizing estimate would require not only queries per 
minute (QPM), but also an average data
+        size scanned per query (D). With the proper partitioning strategy, D 
is usually a fraction of the total
+        data size. The following equation can be used as a rough guide to 
estimate the number of nodes (N) needed:
+      </p>
+
+<pre class="pre codeblock"><code>Eq 1: N &gt; QPM * D / 100 GB
+</code></pre>
+
+      <p class="p">
+        Here is an example. Suppose, on average, a query scans 50 GB of data 
and the average response time is
+        required to be 15 seconds or less when there are 100 concurrent 
queries. The QPM is 100/15*60 = 400. We can
+        estimate the number of node using our equation above.
+      </p>
+
+<pre class="pre codeblock"><code>N &gt; QPM * D / 100GB
+N &gt; 400 * 50GB / 100GB
+N &gt; 200
+</code></pre>
+
+      <p class="p">
+        Because this figure is a rough estimate, the corresponding number of 
nodes could be between 100 and 500.
+      </p>
+
+      <p class="p">
+        Depending on the complexity of the query, the processing rate of query 
might change. If the query has more
+        joins, aggregation functions, or CPU-intensive functions such as 
string processing or complex UDFs, the
+        process rate will be lower than 1.6 GB/second per node. On the other 
hand, if the query only does scan and
+        filtering on numbers, the processing rate can be higher.
+      </p>
+    </section>
+
+    <section class="section" id="cluster_sizing__sizing_mem_estimate"><h2 
class="title sectiontitle">Estimating Memory Requirements</h2>
+
+      
+      
+
+      <p class="p">
+        Impala can handle joins between multiple large tables. Make sure that 
statistics are collected for all the
+        joined tables, using the <code class="ph codeph"><a class="xref" 
href="impala_compute_stats.html#compute_stats">COMPUTE
+        STATS</a></code> statement. However, joining big tables does consume 
more memory. Follow the steps
+        below to calculate the minimum memory requirement.
+      </p>
+
+      <p class="p">
+        Suppose you are running the following join:
+      </p>
+
+<pre class="pre codeblock"><code>select a.*, b.col_1, b.col_2, â¦ b.col_n
+from a, b
+where a.key = b.key
+and b.col_1 in (1,2,4...)
+and b.col_4 in (....);
+</code></pre>
+
+      <p class="p">
+        And suppose table <code class="ph codeph">B</code> is smaller than 
table <code class="ph codeph">A</code> (but still a large table).
+      </p>
+
+      <p class="p">
+        The memory requirement for the query is the right-hand table (<code 
class="ph codeph">B</code>), after decompression,
+        filtering (<code class="ph codeph">b.col_n in ...</code>) and after 
projection (only using certain columns) must be less
+        than the total memory of the entire cluster.
+      </p>
+
+<pre class="pre codeblock"><code>Cluster Total Memory Requirement  = Size of 
the smaller table *
+  selectivity factor from the predicate *
+  projection factor * compression ratio
+</code></pre>
+
+      <p class="p">
+        In this case, assume that table <code class="ph codeph">B</code> is 
100 TB in Parquet format with 200 columns. The
+        predicate on <code class="ph codeph">B</code> (<code class="ph 
codeph">b.col_1 in ...and b.col_4 in ...</code>) will select only 10% of
+        the rows from <code class="ph codeph">B</code> and for projection, we 
are only projecting 5 columns out of 200 columns.
+        Usually, Snappy compression gives us 3 times compression, so we 
estimate a 3x compression factor.
+      </p>
+
+<pre class="pre codeblock"><code>Cluster Total Memory Requirement  = Size of 
the smaller table *
+  selectivity factor from the predicate *
+  projection factor * compression ratio
+  = 100TB * 10% * 5/200 * 3
+  = 0.75TB
+  = 750GB
+</code></pre>
+
+      <p class="p">
+        So, if you have a 10-node cluster, each node has 128 GB of RAM and you 
give 80% to Impala, then you have 1
+        TB of usable memory for Impala, which is more than 750GB. Therefore, 
your cluster can handle join queries
+        of this magnitude.
+      </p>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_planning.html">Planning for Impala 
Deployment</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_comments.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_comments.html 
b/docs/build/html/topics/impala_comments.html
new file mode 100644
index 0000000..e3d711a
--- /dev/null
+++ b/docs/build/html/topics/impala_comments.html
@@ -0,0 +1,46 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_langref.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="comments"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Comments</title></head><body id="comments"><main 
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Comments</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports the familiar styles of SQL comments:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        All text from a <code class="ph codeph">--</code> sequence to the end 
of the line is considered a comment and ignored.
+        This type of comment can occur on a single line by itself, or after 
all or part of a statement.
+      </li>
+
+      <li class="li">
+        All text from a <code class="ph codeph">/*</code> sequence to the next 
<code class="ph codeph">*/</code> sequence is considered a
+        comment and ignored. This type of comment can stretch over multiple 
lines. This type of comment can occur
+        on one or more lines by itself, in the middle of a statement, or 
before or after a statement.
+      </li>
+    </ul>
+
+    <p class="p">
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>-- This line is a comment about a table.
+create table ...;
+
+/*
+This is a multi-line comment about a query.
+*/
+select ...;
+
+select * from t /* This is an embedded comment about a query. */ where ...;
+
+select * from t -- This is a trailing comment within a multi-line command.
+where ...;
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_langref.html">Impala SQL Language 
Reference</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[44/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Reply via email to