[jira] [Commented] (TAJO-1740) Update Partition Table document

ASF GitHub Bot (JIRA) Sun, 17 Jan 2016 21:24:12 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104192#comment-15104192
 ]


ASF GitHub Bot commented on TAJO-1740:
--------------------------------------

Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/896#discussion_r49963051
  
    --- Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst ---
    @@ -44,9 +72,166 @@ during query planning phase.
     * LIKE predicates with a leading wild-card character
     * IN list predicates
     
    +.. code-block:: sql
    +
    +  SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL';
    +  SELECT * FROM student WHERE country = 'USA' AND (city = 'NEWYORK' OR 
city = 'BOSTON');
    +  SELECT * FROM student WHERE country = 'USA' AND city <> 'NEWYORK';
    +
    +
    +==================================================
    +Add data to Partition Table
    +==================================================
    +
    +Tajo provides a very useful feature of dynamic partitioning. You don't 
need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table 
As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically 
filter the data, create directories, move filtered data to appropriate 
directory and create partition over it.
    +
    +For example, assume there are both ``student_source`` and ``student`` 
tables composed of the following schema.
    +
    +.. code-block:: sql
    +
    +  CREATE TABLE student_source (
    +    id        INT,
    +    name      TEXT,
    +    gender    char(1),
    +    grade     TEXT,
    +    country   TEXT,
    +    city      TEXT,
    +    phone     TEXT
    +  );
    +
    +  CREATE TABLE student (
    +    id        INT,
    +    name      TEXT,
    +    gender    char(1),
    +    grade     TEXT,
    +    phone     TEXT
    +  ) PARTITION BY COLUMN (country TEXT, city TEXT);
    +
    +
    +How to INSERT dynamically to partition table
    +--------------------------------------------------------
    +
    +If you want to load an entire country or an entire city in one fell swoop:
    +
    +.. code-block:: sql
    +
    +  INSERT OVERWRITE INTO student
    +  SELECT id, name, gender, grade, phone, country, city
    +  FROM   student_source;
    +
    +
    +How to CTAS dynamically to partition table
    +--------------------------------------------------------
    +
    +when a partition table is created:
    +
    +.. code-block:: sql
    +
    +  DROP TABLE if exists student;
    +
    +  CREATE TABLE student (
    +    id        INT,
    +    name      TEXT,
    +    gender    char(1),
    +    grade     TEXT,
    +    phone     TEXT
    +  ) PARTITION BY COLUMN (country TEXT, city TEXT)
    +  AS SELECT id, name, gender, grade, phone, country, city
    +  FROM   student_source;
    +
    +
    +.. note::
    +
    +  When loading data into a partition, it’s necessary to include the 
partition columns as the last columns in the query. The column names in the 
source query don’t need to match the partition column names.
    +
    +
     ==================================================
     Compatibility Issues with Apache Hive™
     ==================================================
     
     If partitioned tables of Hive are created as external tables in Tajo, Tajo 
can process the Hive partitioned tables directly.
    -There haven't known compatibility issues yet.
    \ No newline at end of file
    +
    +
    +How to create partition table
    +--------------------------------------------------------
    +
    +If you create a partition table as follows in Tajo:
    +
    +.. code-block:: sql
    +
    +  default> CREATE TABLE student (
    +    id        INT,
    +    name      TEXT,
    +    gender    char(1),
    +    grade     TEXT,
    +    phone     TEXT
    +  ) PARTITION BY COLUMN (country TEXT, city TEXT);
    +
    +
    +And then you can get table information in Hive:
    +
    +.. code-block:: sql
    +
    +  hive> desc student;
    +  OK
    +  id                       int
    +  name                     string
    +  gender                   char(1)
    +  grade                    string
    +  phone                    string
    +  country                  string
    +  city                     string
    +
    +  # Partition Information
    +  # col_name               data_type               comment
    +
    +  country                  string
    +  city                     string
    +
    +
    +Or as you create the table in Hive:
    +
    +.. code-block:: sql
    +
    +  hive > CREATE TABLE student (
    +    id int,
    +    name string,
    +    gender char(1),
    +    grade string,
    +    phone string
    +  ) PARTITIONED BY (country string, city string)
    +  ROW FORMAT DELIMITED
    +    FIELDS TERMINATED BY '|' ;
    +
    +You will see table information in Tajo:
    +
    +.. code-block:: sql
    +
    +  default> \d student;
    +  table name: default.student
    +  table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student
    +  store type: TEXT
    +  number of rows: 0
    +  volume: 0 B
    +  Options:
    +    'text.null'='\\N'
    +    'transient_lastDdlTime'='1438756422'
    +    'text.delimiter'='|'
    +
    +  schema:
    +  id       INT4
    +  name     TEXT
    +  gender   CHAR(1)
    +  grade    TEXT
    +  phone    TEXT
    +
    +  Partitions:
    +  type:COLUMN
    +  columns::default.student.country (TEXT), default.student.city (TEXT)
    +
    +
    +How to add data to partition table
    +--------------------------------------------------------
    +
    +In Tajo, you can add data dynamically to partition table of Hive with both 
``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. 
Tajo will automatically filter the data to HiveMetastore, create directories 
and move filtered data to appropriate directory on the distributed file system
    --- End diff --
    
    '.' is missed at end of the statement.


> Update Partition Table document
> -------------------------------
>
>                 Key: TAJO-1740
>                 URL: https://issues.apache.org/jira/browse/TAJO-1740
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Documentation
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.12.0
>
>         Attachments: TAJO-1740.patch
>
>
> Currently, Tajo doesn't provide enough informations about partition table. 
> Thus, we need to add more informations to following documentation.
> http://tajo.apache.org/docs/current/partitioning/column_partitioning.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TAJO-1740) Update Partition Table document

Reply via email to