[12/33] impala git commit: IMPALA-7244: [DOCS] Remove unsupported format writer support

boroknagyz Mon, 19 Nov 2018 02:45:41 -0800

IMPALA-7244: [DOCS] Remove unsupported format writer support

- Added a "removed" note for the ALLOW_UNSUPPORTED_FORMATS and
SEQ_COMPRESSION_MODE query options.
- Will remove the above options from the docs at the next
compatibility breaking release.


Change-Id: I363accf5f284d2a1535cea0652b2b579379b9588
Reviewed-on: http://gerrit.cloudera.org:8080/11842
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Bikramjeet Vig <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/cb2574b8
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/cb2574b8
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/cb2574b8

Branch: refs/heads/branch-3.1.0
Commit: cb2574b8a131a1b15136ceb4a5f8d3896afa2731
Parents: c124d26
Author: Alex Rodoni <[email protected]>
Authored: Wed Oct 31 18:44:18 2018 -0700
Committer: Zoltan Borok-Nagy <[email protected]>
Committed: Tue Nov 13 12:50:23 2018 +0100

----------------------------------------------------------------------
 .../topics/impala_allow_unsupported_formats.xml |  7 +-
 docs/topics/impala_avro.xml                     | 19 +---
 docs/topics/impala_file_formats.xml             | 32 +++----
 docs/topics/impala_seq_compression_mode.xml     | 11 +--
 docs/topics/impala_seqfile.xml                  | 26 ++----
 docs/topics/impala_txtfile.xml                  | 96 ++++----------------
 6 files changed, 43 insertions(+), 148 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/cb2574b8/docs/topics/impala_allow_unsupported_formats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_allow_unsupported_formats.xml 
b/docs/topics/impala_allow_unsupported_formats.xml
index 55ec545..d140c1c 100644
--- a/docs/topics/impala_allow_unsupported_formats.xml
+++ b/docs/topics/impala_allow_unsupported_formats.xml
@@ -31,11 +31,8 @@ under the License.
   </prolog>
 
   <conbody>
-
-    <p>
-      An obsolete query option from early work on support for file formats. Do 
not use. Might be removed in the
-      future.
-    </p>
+    <note>This query option was removed in <keyword keyref="impala31"/> and no
+      longer has any effect. Do not use.</note>
 
     <p conref="../shared/impala_common.xml#common/type_boolean"/>
     <p conref="../shared/impala_common.xml#common/default_false_0"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/cb2574b8/docs/topics/impala_avro.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_avro.xml b/docs/topics/impala_avro.xml
index 8e07d47..fd85f1a 100644
--- a/docs/topics/impala_avro.xml
+++ b/docs/topics/impala_avro.xml
@@ -34,12 +34,10 @@ under the License.
 
   <conbody>
 
-    <p rev="1.4.0">
-      <indexterm audience="hidden">Avro support in Impala</indexterm>
-      Impala supports using tables whose data files use the Avro file format. 
Impala can query Avro
-      tables, and in Impala 1.4.0 and higher can create them, but currently 
cannot insert data into them. For
-      insert operations, use Hive, then switch back to Impala to run queries.
-    </p>
+    <p rev="1.4.0"> Impala supports using tables whose data files use the Avro
+      file format. Impala can query Avro tables. In Impala 1.4.0 and higher,
+      Impala can create Avro tables, but cannot insert data into them. For
+      insert operations, use Hive, then switch back to Impala to run queries. 
</p>
 
     <table>
       <title>Avro Format Support in Impala</title>
@@ -192,15 +190,6 @@ hive> CREATE TABLE hive_avro_table
         name, is ignored.
       </p>
 
-<!-- Have not got a working example of this syntax yet from Lenni.
-<p>
-The schema can be specified either through the <codeph>TBLPROPERTIES</codeph> 
clause or the
-<codeph>WITH SERDEPROPERTIES</codeph> clause.
-For best compatibility with future versions of Hive, use the <codeph>WITH 
SERDEPROPERTIES</codeph> clause
-for this information.
-</p>
--->
-
       <note>
         For nullable Avro columns, make sure to put the 
<codeph>"null"</codeph> entry before the actual type name.
         In Impala, all columns are nullable; Impala currently does not have a 
<codeph>NOT NULL</codeph> clause. Any

http://git-wip-us.apache.org/repos/asf/impala/blob/cb2574b8/docs/topics/impala_file_formats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_file_formats.xml 
b/docs/topics/impala_file_formats.xml
index 7516eae..7f59d2c 100644
--- a/docs/topics/impala_file_formats.xml
+++ b/docs/topics/impala_file_formats.xml
@@ -38,14 +38,12 @@ under the License.
 
   <conbody>
 
-    <p>
-      <indexterm audience="hidden">file formats</indexterm>
-      <indexterm audience="hidden">compression</indexterm>
-      Impala supports several familiar file formats used in Apache Hadoop. 
Impala can load and query data files
-      produced by other Hadoop components such as Pig or MapReduce, and data 
files produced by Impala can be used
-      by other components also. The following sections discuss the procedures, 
limitations, and performance
-      considerations for using each file format with Impala.
-    </p>
+    <p>Impala supports several familiar file formats used in Apache Hadoop.
+      Impala can load and query data files produced by other Hadoop components
+      such as Pig or MapReduce, and data files produced by Impala can be used 
by
+      other components also. The following sections discuss the procedures,
+      limitations, and performance considerations for using each file format
+      with Impala. </p>
 
     <p>
       The file format used for an Impala table has significant performance 
consequences. Some file formats include
@@ -143,14 +141,11 @@ under the License.
               format is uncompressed text, with values separated by ASCII 
<codeph>0x01</codeph> characters
               (typically represented as Ctrl-A).
             </entry>
-            <entry>
-              Yes: <codeph>CREATE TABLE</codeph>, <codeph>INSERT</codeph>, 
<codeph>LOAD DATA</codeph>, and query.
-              If LZO compression is used, you must create the table and load 
data in Hive. If other kinds of
-              compression are used, you must load data through <codeph>LOAD 
DATA</codeph>, Hive, or manually in
-              HDFS.
-
-<!--            <ph rev="2.0.0">Impala 2.0 and higher can write LZO-compressed 
text data; for earlier Impala releases,  you must create the table and load 
data in Hive.</ph> -->
-            </entry>
+            <entry> Yes: <codeph>CREATE TABLE</codeph>, 
<codeph>INSERT</codeph>,
+                <codeph>LOAD DATA</codeph>, and query. If LZO compression is
+              used, you must create the table and load data in Hive. If other
+              kinds of compression are used, you must load data through
+                <codeph>LOAD DATA</codeph>, Hive, or manually in HDFS.</entry>
           </row>
           <row id="avro_support">
             <entry>
@@ -162,9 +157,8 @@ under the License.
             <entry>
               Snappy, gzip, deflate, bzip2
             </entry>
-            <entry rev="1.4.0">
-              Yes, in Impala 1.4.0 and higher. Before that, create the table 
using Hive.
-            </entry>
+            <entry rev="1.4.0"> Yes, in Impala 1.4.0 and higher. In lower
+              versions, create the table using Hive. </entry>
             <entry>
               No. Import data by using <codeph>LOAD DATA</codeph> on data 
files already in the right format, or use
               <codeph>INSERT</codeph> in Hive followed by <codeph>REFRESH 
<varname>table_name</varname></codeph> in Impala.

http://git-wip-us.apache.org/repos/asf/impala/blob/cb2574b8/docs/topics/impala_seq_compression_mode.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_seq_compression_mode.xml 
b/docs/topics/impala_seq_compression_mode.xml
index 09b6fd5..d1d6e93 100644
--- a/docs/topics/impala_seq_compression_mode.xml
+++ b/docs/topics/impala_seq_compression_mode.xml
@@ -33,15 +33,8 @@ under the License.
   <conbody>
 
     <p rev="2.5.0">
-      <indexterm audience="hidden">RM_INITIAL_MEM query option</indexterm>
-    </p>
-
-    <p>
-      <b>Type:</b>
-    </p>
-
-    <p>
-      <b>Default:</b>
+      <note>This query option was removed in <keyword keyref="impala31"/> and 
no
+        longer has any effect. Do not use.</note>
     </p>
   </conbody>
 </concept>

http://git-wip-us.apache.org/repos/asf/impala/blob/cb2574b8/docs/topics/impala_seqfile.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_seqfile.xml b/docs/topics/impala_seqfile.xml
index 7143530..db5a231 100644
--- a/docs/topics/impala_seqfile.xml
+++ b/docs/topics/impala_seqfile.xml
@@ -34,10 +34,7 @@ under the License.
 
   <conbody>
 
-    <p>
-      <indexterm audience="hidden">SequenceFile support in Impala</indexterm>
-      Impala supports using SequenceFile data files.
-    </p>
+    <p> Impala supports using SequenceFile data files. </p>
 
     <table>
       <title>SequenceFile Format Support in Impala</title>
@@ -160,12 +157,11 @@ Returned 3 row(s) in 0.23s</codeblock>
 
     <conbody>
 
-      <p>
-        <indexterm audience="hidden">compression</indexterm>
-        You may want to enable compression on existing tables. Enabling 
compression provides performance gains in
-        most cases and is supported for SequenceFile tables. For example, to 
enable Snappy compression, you would
-        specify the following additional settings when loading data through 
the Hive shell:
-      </p>
+      <p> You may want to enable compression on existing tables. Enabling
+        compression provides performance gains in most cases and is supported
+        for SequenceFile tables. For example, to enable Snappy compression, you
+        would specify the following additional settings when loading data
+        through the Hive shell: </p>
 
 <codeblock>hive&gt; SET hive.exec.compress.output=true;
 hive&gt; SET mapred.max.split.size=256000000;
@@ -225,16 +221,6 @@ hive&gt; INSERT OVERWRITE TABLE tbl_seq PARTITION(year) 
SELECT * FROM tbl;</code
     </conbody>
   </concept>
 
-  <concept audience="hidden" id="seqfile_data_types">
-
-    <title>Data Type Considerations for SequenceFile Tables</title>
-
-    <conbody>
-
-      <p></p>
-    </conbody>
-  </concept>
-
   <concept id="seqfile_performance">
 
     <title>Query Performance for Impala SequenceFile Tables</title>

http://git-wip-us.apache.org/repos/asf/impala/blob/cb2574b8/docs/topics/impala_txtfile.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_txtfile.xml b/docs/topics/impala_txtfile.xml
index 3c09b80..2b8fb58 100644
--- a/docs/topics/impala_txtfile.xml
+++ b/docs/topics/impala_txtfile.xml
@@ -34,12 +34,10 @@ under the License.
 
   <conbody>
 
-    <p>
-      <indexterm audience="hidden">Text support in Impala</indexterm>
-      Impala supports using text files as the storage format for input and 
output. Text files are a
-      convenient format to use for interchange with other applications or 
scripts that produce or read delimited
-      text files, such as CSV or TSV with commas or tabs for delimiters.
-    </p>
+    <p> Impala supports using text files as the storage format for input and
+      output. Text files are a convenient format to use for interchange with
+      other applications or scripts that produce or read delimited text files,
+      such as CSV or TSV with commas or tabs for delimiters. </p>
 
     <p>
       Text files are also very flexible in their column definitions. For 
example, a text file could have more
@@ -223,20 +221,6 @@ create table pipe_separated(id int, s string, n int, t 
timestamp, b boolean)
         </p>
       </note>
 
-<!--
-      <p>
-        In the <cmdname>impala-shell</cmdname> interpreter, issue a command 
similar to:
-      </p>
-
-<codeblock>create table textfile_table (<varname>column_specs</varname>) 
stored as textfile;
-/* If the STORED AS clause is omitted, the default is a TEXTFILE with hex 01 
characters as the delimiter. */
-create table default_table (<varname>column_specs</varname>);
-/* Some optional clauses in the CREATE TABLE statement apply only to Text 
tables. */
-create table csv_table (<varname>column_specs</varname>) row format delimited 
fields terminated by ',';
-create table tsv_table (<varname>column_specs</varname>) row format delimited 
fields terminated by '\t';
-create table dos_table (<varname>column_specs</varname>) lines terminated by 
'\r';</codeblock>
--->
-
       <p>
         Issue a <codeph>DESCRIBE FORMATTED 
<varname>table_name</varname></codeph> statement to see the details of
         how each table is represented internally in Impala.
@@ -271,7 +255,6 @@ create table dos_table (<varname>column_specs</varname>) 
lines terminated by '\r
         </li>
 
         <li>
-<!-- Copied and slightly adapted text from later on in this same file. Turn 
into a conref. -->
           <p>
             Impala uses suffixes to recognize when text data files are 
compressed text. For Impala to recognize the
             compressed text files, they must have the appropriate file 
extension corresponding to the compression
@@ -438,14 +421,11 @@ INSERT INTO csv SELECT * FROM 
other_file_format_table;</codeblock>
 
     <conbody>
 
-      <p>
-        <indexterm audience="hidden">LZO support in Impala</indexterm>
-
-        <indexterm audience="hidden">compression</indexterm>
-        Impala supports using text data files that employ LZO compression. 
Where practical, apply compression to
-        text data files. Impala queries are usually I/O-bound; reducing the 
amount of data read from
-        disk typically speeds up a query, despite the extra CPU work to 
uncompress the data in memory.
-      </p>
+      <p> Impala supports using text data files that employ LZO compression.
+        Where practical, apply compression to text data files. Impala queries
+        are usually I/O-bound; reducing the amount of data read from disk
+        typically speeds up a query, despite the extra CPU work to uncompress
+        the data in memory. </p>
 
       <p>
         Impala can work with LZO-compressed text files are preferable to files 
compressed by other codecs, because
@@ -581,15 +561,6 @@ drwxr-xr-x. 2 root root 4096 Oct 28 15:46 
conf.pseudo</codeblock>
     INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
     OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'</codeblock>
 
-<!--
-      <p>
-        In Hive, when writing LZO compressed text tables, you must include the 
following specification:
-      </p>
-
-<codeblock>hive&gt; SET hive.exec.compress.output=true;
-hive&gt; SET 
mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;</codeblock>
--->
-
         <p>
           Also, certain Hive settings need to be in effect. For example:
         </p>
@@ -639,13 +610,6 @@ hive&gt; INSERT INTO TABLE lzo_t SELECT col1, col2 FROM 
uncompressed_text_table;
           DataNodes, which is very inefficient.
         </p>
 
-        <!-- To do:
-          Here is the place to put some end-to-end examples once I have it
-          all working. Or at least the final step with Impala queries.
-          Have never actually gotten this part working yet due to mismatches
-          between the levels of Impala and LZO packages.
-        -->
-
         <p>
           Once the LZO-compressed tables are created, and data is loaded and 
indexed, you can query them through
           Impala. As always, the first time you start 
<cmdname>impala-shell</cmdname> after creating a table in
@@ -673,20 +637,13 @@ hive&gt; INSERT INTO TABLE lzo_t SELECT col1, col2 FROM 
uncompressed_text_table;
 
     <conbody>
 
-      <p>
-        <indexterm audience="hidden">gzip support in Impala</indexterm>
-
-        <indexterm audience="hidden">bzip2 support in Impala</indexterm>
-
-        <indexterm audience="hidden">Snappy support in Impala</indexterm>
-
-        <indexterm audience="hidden">compression</indexterm>
-        In Impala 2.0 and later, Impala supports using text data files that 
employ gzip, bzip2, or Snappy
-        compression. These compression types are primarily for convenience 
within an existing ETL pipeline rather
-        than maximum performance. Although it requires less I/O to read 
compressed text than the equivalent
-        uncompressed text, files compressed by these codecs are not 
<q>splittable</q> and therefore cannot take
-        full advantage of the Impala parallel query capability.
-      </p>
+      <p> In Impala 2.0 and later, Impala supports using text data files that
+        employ gzip, bzip2, or Snappy compression. These compression types are
+        primarily for convenience within an existing ETL pipeline rather than
+        maximum performance. Although it requires less I/O to read compressed
+        text than the equivalent uncompressed text, files compressed by these
+        codecs are not <q>splittable</q> and therefore cannot take full
+        advantage of the Impala parallel query capability. </p>
 
       <p>
         As each bzip2- or Snappy-compressed text file is processed, the node 
doing the work reads the entire file
@@ -697,15 +654,6 @@ hive&gt; INSERT INTO TABLE lzo_t SELECT col1, col2 FROM 
uncompressed_text_table;
         gzip-compressed text files. The gzipped data is decompressed as it is 
read, rather than all at once.</ph>
       </p>
 
-<!--
-    <p>
-    Impala can work with LZO-compressed text files but not GZip-compressed 
text.
-    LZO-compressed files are <q>splittable</q>, meaning that different 
portions of a file
-    can be uncompressed and processed independently by different nodes. 
GZip-compressed
-    files are not splittable, making them unsuitable for Impala-style 
distributed queries.
-    </p>
--->
-
       <p>
         To create a table to hold gzip, bzip2, or Snappy-compressed text, 
create a text table with no special
         compression options. Specify the delimiter and escape character if 
required, using the <codeph>ROW
@@ -764,16 +712,4 @@ $ hdfs dfs -ls 
'hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_co
 
   </concept>
 
-  <concept audience="hidden" id="txtfile_data_types">
-
-    <title>Data Type Considerations for Text Tables</title>
-
-    <conbody>
-
-      <p></p>
-
-    </conbody>
-
-  </concept>
-
 </concept>

[12/33] impala git commit: IMPALA-7244: [DOCS] Remove unsupported format writer support

Reply via email to