tajo git commit: TAJO-1462: Replace CSV examples into TEXT examples in docs.

jihoonson Sat, 04 Apr 2015 02:59:30 -0700

Repository: tajo
Updated Branches:
  refs/heads/master 70d5fdf86 -> b0abff8e8



TAJO-1462: Replace CSV examples into TEXT examples in docs.

Closes #475

Signed-off-by: Jihoon Son <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/tajo/repo
Commit: http://git-wip-us.apache.org/repos/asf/tajo/commit/b0abff8e
Tree: http://git-wip-us.apache.org/repos/asf/tajo/tree/b0abff8e
Diff: http://git-wip-us.apache.org/repos/asf/tajo/diff/b0abff8e

Branch: refs/heads/master
Commit: b0abff8e896d7985eaf1aa48d9c2ab3a45618f01
Parents: 70d5fdf
Author: Dongjoon Hyun <[email protected]>
Authored: Sat Apr 4 18:58:38 2015 +0900
Committer: Jihoon Son <[email protected]>
Committed: Sat Apr 4 18:58:38 2015 +0900

----------------------------------------------------------------------
 CHANGES                                         |   3 +
 .../main/sphinx/backup_and_restore/catalog.rst  |   2 +-
 tajo-docs/src/main/sphinx/getting_started.rst   |   2 +-
 tajo-docs/src/main/sphinx/sql_language/ddl.rst  |   2 +-
 .../src/main/sphinx/table_management/csv.rst    | 115 -------------------
 .../sphinx/table_management/file_formats.rst    |   2 +-
 .../sphinx/table_management/table_overview.rst  |   6 +-
 .../src/main/sphinx/table_management/text.rst   | 115 +++++++++++++++++++
 8 files changed, 125 insertions(+), 122 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/CHANGES
----------------------------------------------------------------------
diff --git a/CHANGES b/CHANGES
index 5ee7715..3def16c 100644
--- a/CHANGES
+++ b/CHANGES
@@ -124,6 +124,9 @@ Release 0.11.0 - unreleased
   
   TASKS
 
+    TAJO-1462: Replace CSV examples into TEXT examples in docs. 
+    (Contributed by Dongjoon Hyun, Committed by jihoon)
+
     TAJO-1424: Investigate the problem of too many "Try to connect" messeges 
     during Travic CI build. (Contributed by navis, Committed by jihoon)
 

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst 
b/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
index 200aa85..1c2b709 100644
--- a/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
+++ b/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
@@ -28,7 +28,7 @@ For example, if you want to backup a table customer, you 
should type a command a
   -- Name: customer; Type: TABLE; Storage: CSV
   -- Path: file:/home/hyunsik/tpch/customer
   --
-  CREATE EXTERNAL TABLE customer (c_custkey INT8, c_name TEXT, c_address TEXT, 
c_nationkey INT8, c_phone TEXT, c_acctbal FLOAT8, c_mktsegment TEXT, c_comment 
TEXT) USING CSV LOCATION 'file:/home/hyunsik/tpch/customer';
+  CREATE EXTERNAL TABLE customer (c_custkey INT8, c_name TEXT, c_address TEXT, 
c_nationkey INT8, c_phone TEXT, c_acctbal FLOAT8, c_mktsegment TEXT, c_comment 
TEXT) USING TEXT LOCATION 'file:/home/hyunsik/tpch/customer';
   
 
 If you want to restore the catalog from the SQL dump file, please type the 
below command: ::

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/getting_started.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/getting_started.rst 
b/tajo-docs/src/main/sphinx/getting_started.rst
index eaf6973..e30c3fe 100644
--- a/tajo-docs/src/main/sphinx/getting_started.rst
+++ b/tajo-docs/src/main/sphinx/getting_started.rst
@@ -135,7 +135,7 @@ Here, we assume the schema as (int, text, float, text). ::
         name text, 
         score float, 
         type text) 
-        using csv with ('text.delimiter'='|') location 'file:/home/x/table1';
+        using text with ('text.delimiter'='|') location 'file:/home/x/table1';
 
 To load an external table, you need to use âcreate external tableâ 
statement. 
 In the location clause, you should use the absolute directory path with an 
appropriate scheme. 

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/sql_language/ddl.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/sql_language/ddl.rst 
b/tajo-docs/src/main/sphinx/sql_language/ddl.rst
index 60b7190..662ccff 100644
--- a/tajo-docs/src/main/sphinx/sql_language/ddl.rst
+++ b/tajo-docs/src/main/sphinx/sql_language/ddl.rst
@@ -56,7 +56,7 @@ If you want to add an external table that contains compressed 
data, you should g
   ...
   L_COMMENT text) 
 
-  USING csv WITH 
('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
+  USING TEXT WITH 
('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
   LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
 
 `compression.codec` parameter can have one of the following compression codecs:

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/csv.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/csv.rst 
b/tajo-docs/src/main/sphinx/table_management/csv.rst
deleted file mode 100644
index 53c6e1d..0000000
--- a/tajo-docs/src/main/sphinx/table_management/csv.rst
+++ /dev/null
@@ -1,115 +0,0 @@
-*************************************
-CSV (TextFile)
-*************************************
-
-A character-separated values (CSV) file represents a tabular data set 
consisting of rows and columns.
-Each row is a plan-text line. A line is usually broken by a character line 
feed ``\n`` or carriage-return ``\r``.
-The line feed ``\n`` is the default delimiter in Tajo. Each record consists of 
multiple fields, separated by
-some other character or string, most commonly a literal vertical bar ``|``, 
comma ``,`` or tab ``\t``.
-The vertical bar is used as the default field delimiter in Tajo.
-
-=========================================
-How to Create a CSV Table ?
-=========================================
-
-If you are not familiar with the ``CREATE TABLE`` statement, please refer to 
the Data Definition Language :doc:`/sql_language/ddl`.
-
-In order to specify a certain file format for your table, you need to use the 
``USING`` clause in your ``CREATE TABLE``
-statement. The below is an example statement for creating a table using CSV 
files.
-
-.. code-block:: sql
-
- CREATE TABLE
-  table1 (
-    id int,
-    name text,
-    score float,
-    type text
-  ) USING CSV;
-
-=========================================
-Physical Properties
-=========================================
-
-Some table storage formats provide parameters for enabling or disabling 
features and adjusting physical parameters.
-The ``WITH`` clause in the CREATE TABLE statement allows users to set those 
parameters.
-
-Now, the CSV storage format provides the following physical properties.
-
-* ``text.delimiter``: delimiter character. ``|`` or ``\u0001`` is usually 
used, and the default field delimiter is ``|``.
-* ``text.null``: NULL character. The default NULL character is an empty string 
``''``. Hive's default NULL character is ``'\\N'``.
-* ``compression.codec``: Compression codec. You can enable compression feature 
and set specified compression algorithm. The compression algorithm used to 
compress files. The compression codec name should be the fully qualified class 
name inherited from `org.apache.hadoop.io.compress.CompressionCodec 
<https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_.
 By default, compression is disabled.
-* ``csvfile.serde`` (deprecated): custom (De)serializer class. 
``org.apache.tajo.storage.TextSerializerDeserializer`` is the default 
(De)serializer class.
-* ``timezone``: the time zone that the table uses for writting. When table 
rows are read or written, ```timestamp``` and ```time``` column values are 
adjusted by this timezone if it is set. Time zone can be an abbreviation form 
like 'PST' or 'DST'. Also, it accepts an offset-based form like 'UTC+9' or a 
location-based form like 'Asia/Seoul'.
-* ``text.error-tolerance.max-num``: the maximum number of permissible parsing 
errors. This value should be an integer value. By default, 
``text.error-tolerance.max-num`` is ``0``. According to the value, parsing 
errors will be handled in different ways.
-  * If ``text.error-tolerance.max-num < 0``, all parsing errors are ignored.
-  * If ``text.error-tolerance.max-num == 0``, any parsing error is not 
allowed. If any error occurs, the query will be failed. (default)
-  * If ``text.error-tolerance.max-num > 0``, the given number of parsing 
errors in each task will be pemissible.
-
-The following example is to set a custom field delimiter, NULL character, and 
compression codec:
-
-.. code-block:: sql
-
- CREATE TABLE table1 (
-  id int,
-  name text,
-  score float,
-  type text
- ) USING CSV WITH('text.delimiter'='\u0001',
-                  'text.null'='\\N',
-                  
'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
-
-.. warning::
-
-  Be careful when using ``\n`` as the field delimiter because CSV uses ``\n`` 
as the line delimiter.
-  At the moment, Tajo does not provide a way to specify the line delimiter.
-
-=========================================
-Custom (De)serializer
-=========================================
-
-The CSV storage format not only provides reading and writing interfaces for 
CSV data but also allows users to process custom
-plan-text file formats with user-defined (De)serializer classes.
-For example, with custom (de)serializers, Tajo can process JSON file formats 
or any specialized plan-text file formats.
-
-In order to specify a custom (De)serializer, set a physical property 
``csvfile.serde``.
-The property value should be a fully qualified class name.
-
-For example:
-
-.. code-block:: sql
-
- CREATE TABLE table1 (
-  id int,
-  name text,
-  score float,
-  type text
- ) USING CSV WITH 
('csvfile.serde'='org.my.storage.CustomSerializerDeserializer')
-
-
-=========================================
-Null Value Handling Issues
-=========================================
-In default, NULL character in CSV files is an empty string ``''``.
-In other words, an empty field is basically recognized as a NULL value in Tajo.
-If a field domain is ``TEXT``, an empty field is recognized as a string value 
``''`` instead of NULL value.
-Besides, You can also use your own NULL character by specifying a physical 
property ``text.null``.
-
-=========================================
-Compatibility Issues with Apache Hiveâ¢
-=========================================
-
-CSV files generated in Tajo can be processed directly by Apache Hiveâ¢ 
without further processing.
-In this section, we explain some compatibility issue for users who use both 
Hive and Tajo.
-
-If you set a custom field delimiter, the CSV tables cannot be directly used in 
Hive.
-In order to specify the custom field delimiter in Hive, you need to use ``ROW 
FORMAT DELIMITED FIELDS TERMINATED BY``
-clause in a Hive's ``CREATE TABLE`` statement as follows:
-
-.. code-block:: sql
-
- CREATE TABLE table1 (id int, name string, score float, type string)
- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
- STORED AS TEXT
-
-To the best of our knowledge, there is not way to specify a custom NULL 
character in Hive.

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/file_formats.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/file_formats.rst 
b/tajo-docs/src/main/sphinx/table_management/file_formats.rst
index c15dd3f..0579497 100644
--- a/tajo-docs/src/main/sphinx/table_management/file_formats.rst
+++ b/tajo-docs/src/main/sphinx/table_management/file_formats.rst
@@ -7,7 +7,7 @@ Currently, Tajo provides four file formats as follows:
 .. toctree::
     :maxdepth: 1
 
-    csv
+    text
     rcfile
     parquet
     sequencefile
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/table_overview.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/table_overview.rst 
b/tajo-docs/src/main/sphinx/table_management/table_overview.rst
index 3d933c2..3985e19 100644
--- a/tajo-docs/src/main/sphinx/table_management/table_overview.rst
+++ b/tajo-docs/src/main/sphinx/table_management/table_overview.rst
@@ -29,9 +29,9 @@ The following example is to set a custom field delimiter, 
NULL character, and co
   name text,
   score float,
   type text
- ) USING CSV WITH('text.delimiter'='\u0001',
-                  'text.null'='\\N',
-                  
'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
+ ) USING TEXT WITH('text.delimiter'='\u0001',
+                   'text.null'='\\N',
+                   
'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
 
 Each physical table layout has its own specialized properties. They will be 
addressed in :doc:`/table_management/file_formats`.
 

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/text.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/text.rst 
b/tajo-docs/src/main/sphinx/table_management/text.rst
new file mode 100644
index 0000000..3727b03
--- /dev/null
+++ b/tajo-docs/src/main/sphinx/table_management/text.rst
@@ -0,0 +1,115 @@
+*************************************
+TEXT
+*************************************
+
+A character-separated values plain-text file represents a tabular data set 
consisting of rows and columns.
+Each row is a plan-text line. A line is usually broken by a character line 
feed ``\n`` or carriage-return ``\r``.
+The line feed ``\n`` is the default delimiter in Tajo. Each record consists of 
multiple fields, separated by
+some other character or string, most commonly a literal vertical bar ``|``, 
comma ``,`` or tab ``\t``.
+The vertical bar is used as the default field delimiter in Tajo.
+
+=========================================
+How to Create a TEXT Table ?
+=========================================
+
+If you are not familiar with the ``CREATE TABLE`` statement, please refer to 
the Data Definition Language :doc:`/sql_language/ddl`.
+
+In order to specify a certain file format for your table, you need to use the 
``USING`` clause in your ``CREATE TABLE``
+statement. The below is an example statement for creating a table using *TEXT* 
format.
+
+.. code-block:: sql
+
+ CREATE TABLE
+  table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING TEXT;
+
+=========================================
+Physical Properties
+=========================================
+
+Some table storage formats provide parameters for enabling or disabling 
features and adjusting physical parameters.
+The ``WITH`` clause in the CREATE TABLE statement allows users to set those 
parameters.
+
+*TEXT* format provides the following physical properties.
+
+* ``text.delimiter``: delimiter character. ``|`` or ``\u0001`` is usually 
used, and the default field delimiter is ``|``.
+* ``text.null``: ``NULL`` character. The default ``NULL`` character is an 
empty string ``''``. Hive's default ``NULL`` character is ``'\\N'``.
+* ``compression.codec``: Compression codec. You can enable compression feature 
and set specified compression algorithm. The compression algorithm used to 
compress files. The compression codec name should be the fully qualified class 
name inherited from `org.apache.hadoop.io.compress.CompressionCodec 
<https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_.
 By default, compression is disabled.
+* ``text.serde``: custom (De)serializer class. 
``org.apache.tajo.storage.text.CSVLineSerDe`` is the default (De)serializer 
class.
+* ``timezone``: the time zone that the table uses for writting. When table 
rows are read or written, ```timestamp``` and ```time``` column values are 
adjusted by this timezone if it is set. Time zone can be an abbreviation form 
like 'PST' or 'DST'. Also, it accepts an offset-based form like 'UTC+9' or a 
location-based form like 'Asia/Seoul'.
+* ``text.error-tolerance.max-num``: the maximum number of permissible parsing 
errors. This value should be an integer value. By default, 
``text.error-tolerance.max-num`` is ``0``. According to the value, parsing 
errors will be handled in different ways.
+  * If ``text.error-tolerance.max-num < 0``, all parsing errors are ignored.
+  * If ``text.error-tolerance.max-num == 0``, any parsing error is not 
allowed. If any error occurs, the query will be failed. (default)
+  * If ``text.error-tolerance.max-num > 0``, the given number of parsing 
errors in each task will be pemissible.
+
+The following example is to set a custom field delimiter, ``NULL`` character, 
and compression codec:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (
+  id int,
+  name text,
+  score float,
+  type text
+ ) USING TEXT WITH('text.delimiter'='\u0001',
+                   'text.null'='\\N',
+                   
'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
+
+.. warning::
+
+  Be careful when using ``\n`` as the field delimiter because *TEXT* format 
tables use ``\n`` as the line delimiter.
+  At the moment, Tajo does not provide a way to specify the line delimiter.
+
+=========================================
+Custom (De)serializer
+=========================================
+
+The *TEXT* format not only provides reading and writing interfaces for text 
data but also allows users to process custom
+plan-text file formats with user-defined (De)serializer classes.
+For example, with custom (de)serializers, Tajo can process JSON file formats 
or any specialized plan-text file formats.
+
+In order to specify a custom (De)serializer, set a physical property 
``text.serde``.
+The property value should be a fully qualified class name.
+
+For example:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (
+  id int,
+  name text,
+  score float,
+  type text
+ ) USING TEXT WITH ('text.serde'='org.my.storage.CustomSerializerDeserializer')
+
+
+=========================================
+Null Value Handling Issues
+=========================================
+In default, ``NULL`` character in *TEXT* format is an empty string ``''``.
+In other words, an empty field is basically recognized as a ``NULL`` value in 
Tajo.
+If a field domain is ``TEXT``, an empty field is recognized as a string value 
``''`` instead of ``NULL`` value.
+Besides, You can also use your own ``NULL`` character by specifying a physical 
property ``text.null``.
+
+=========================================
+Compatibility Issues with Apache Hiveâ¢
+=========================================
+
+*TEXT* tables generated in Tajo can be processed directly by Apache Hiveâ¢ 
without further processing.
+In this section, we explain some compatibility issue for users who use both 
Hive and Tajo.
+
+If you set a custom field delimiter, the *TEXT* tables cannot be directly used 
in Hive.
+In order to specify the custom field delimiter in Hive, you need to use ``ROW 
FORMAT DELIMITED FIELDS TERMINATED BY``
+clause in a Hive's ``CREATE TABLE`` statement as follows:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name string, score float, type string)
+ ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
+ STORED AS TEXT
+
+To the best of our knowledge, there is not way to specify a custom ``NULL`` 
character in Hive.

tajo git commit: TAJO-1462: Replace CSV examples into TEXT examples in docs.

Reply via email to