Repository: tajo Updated Branches: refs/heads/branch-0.11.0 69e059e3b -> 7a603d592
TAJO-1682: Write ORC document. Signed-off-by: Jihoon Son <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/tajo/repo Commit: http://git-wip-us.apache.org/repos/asf/tajo/commit/7a603d59 Tree: http://git-wip-us.apache.org/repos/asf/tajo/tree/7a603d59 Diff: http://git-wip-us.apache.org/repos/asf/tajo/diff/7a603d59 Branch: refs/heads/branch-0.11.0 Commit: 7a603d592b281e8dffc6eccbd354343230c39dad Parents: 69e059e Author: Jongyoung Park <[email protected]> Authored: Thu Sep 17 15:35:45 2015 +0900 Committer: Jihoon Son <[email protected]> Committed: Thu Sep 17 15:36:23 2015 +0900 ---------------------------------------------------------------------- CHANGES | 3 ++ .../sphinx/table_management/file_formats.rst | 1 + .../src/main/sphinx/table_management/orc.rst | 47 ++++++++++++++++++++ 3 files changed, 51 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tajo/blob/7a603d59/CHANGES ---------------------------------------------------------------------- diff --git a/CHANGES b/CHANGES index d6f6c1e..d8790c6 100644 --- a/CHANGES +++ b/CHANGES @@ -547,6 +547,9 @@ Release 0.11.0 - unreleased TASKS + TAJO-1682: Write ORC document. (Contributed by Jongyoung Park, + Committed by jihoon) + TAJO-1744: Porting bash shell scripts to Windows command shell scripts. (Contributed by YeonSu Han, Committed by jihoon) http://git-wip-us.apache.org/repos/asf/tajo/blob/7a603d59/tajo-docs/src/main/sphinx/table_management/file_formats.rst ---------------------------------------------------------------------- diff --git a/tajo-docs/src/main/sphinx/table_management/file_formats.rst b/tajo-docs/src/main/sphinx/table_management/file_formats.rst index 0579497..7768920 100644 --- a/tajo-docs/src/main/sphinx/table_management/file_formats.rst +++ b/tajo-docs/src/main/sphinx/table_management/file_formats.rst @@ -10,4 +10,5 @@ Currently, Tajo provides four file formats as follows: text rcfile parquet + orc sequencefile \ No newline at end of file http://git-wip-us.apache.org/repos/asf/tajo/blob/7a603d59/tajo-docs/src/main/sphinx/table_management/orc.rst ---------------------------------------------------------------------- diff --git a/tajo-docs/src/main/sphinx/table_management/orc.rst b/tajo-docs/src/main/sphinx/table_management/orc.rst new file mode 100644 index 0000000..2733afc --- /dev/null +++ b/tajo-docs/src/main/sphinx/table_management/orc.rst @@ -0,0 +1,47 @@ +*** +ORC +*** + +**ORC(Optimized Row Columnar)** is a columnar storage format from Hive. ORC improves performance for reading, +writing, and processing data. +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki. + +=========================== +How to Create an ORC Table? +=========================== + +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`. + +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE`` +statement. Below is an example statement for creating a table using orc files. + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING orc; + +=================== +Physical Properties +=================== + +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters. + +Now, ORC file provides the following physical properties. + +* ``orc.max.merge.distance``: When ORC file is read, if stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB. +* ``orc.stripe.size``: It decides size of each stripe. Default is 64MB. +* ``orc.compression.kind``: It means the compression algorithm used to compress and write data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``. +* ``orc.buffer.size``: It decides size of writing buffer. Default is 256KB. +* ``orc.rowindex.stride``: Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000. + +====================================== +Compatibility Issues with Apache Hive⢠+====================================== + +At the moment, Tajo only supports flat relational tables. +We are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_). \ No newline at end of file
