Added: tajo/site/docs/0.11.0/_sources/functions/python.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/functions/python.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/functions/python.txt (added) +++ tajo/site/docs/0.11.0/_sources/functions/python.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,159 @@ +****************************** +Python Functions +****************************** + +======================= +User-defined Functions +======================= + +----------------------- +Function registration +----------------------- + +To register Python UDFs, you must install script files in all cluster nodes. +After that, you can register your functions by specifying the paths to those script files in ``tajo-site.xml``. Here is an example of the configuration. + +.. code-block:: xml + + <property> + <name>tajo.function.python.code-dir</name> + <value>/path/to/script1.py,/path/to/script2.py</value> + </property> + +Please note that you can specify multiple paths with ``','`` as a delimiter. Each file can contain multiple functions. Here is a typical example of a script file. + +.. code-block:: python + + # /path/to/udf1.py + + @output_type('int4') + def return_one(): + return 1 + + @output_type("text") + def helloworld(): + return 'Hello, World' + + # No decorator - blob + def concat_py(str): + return str+str + + @output_type('int4') + def sum_py(a,b): + return a+b + +If the configuration is set properly, every function in the script files are registered when the Tajo cluster starts up. + +----------------------- +Decorators and types +----------------------- + +By default, every function has a return type of ``BLOB``. +You can use Python decorators to define output types for the script functions. Tajo can figure out return types from the annotations of the Python script. + +* ``output_type``: Defines the return data type for a script UDF in a format that Tajo can understand. The defined type must be one of the types supported by Tajo. For supported types, please refer to :doc:`/sql_language/data_model`. + +----------------------- +Query example +----------------------- + +Once the Python UDFs are successfully registered, you can use them as other built-in functions. + +.. code-block:: sql + + default> select concat_py(n_name)::text from nation where sum_py(n_regionkey,1) > 2; + +============================================== +User-defined Aggregation Functions +============================================== + +----------------------- +Function registration +----------------------- + +To define your Python aggregation functions, you should write Python classes for each function. +Followings are typical examples of Python UDAFs. + +.. code-block:: python + + # /path/to/udaf1.py + + class AvgPy: + sum = 0 + cnt = 0 + + def __init__(self): + self.reset() + + def reset(self): + self.sum = 0 + self.cnt = 0 + + # eval at the first stage + def eval(self, item): + self.sum += item + self.cnt += 1 + + # get intermediate result + def get_partial_result(self): + return [self.sum, self.cnt] + + # merge intermediate results + def merge(self, list): + self.sum += list[0] + self.cnt += list[1] + + # get final result + @output_type('float8') + def get_final_result(self): + return self.sum / float(self.cnt) + + + class CountPy: + cnt = 0 + + def __init__(self): + self.reset() + + def reset(self): + self.cnt = 0 + + # eval at the first stage + def eval(self): + self.cnt += 1 + + # get intermediate result + def get_partial_result(self): + return self.cnt + + # merge intermediate results + def merge(self, cnt): + self.cnt += cnt + + # get final result + @output_type('int4') + def get_final_result(self): + return self.cnt + + +These classes must provide ``reset()``, ``eval()``, ``merge()``, ``get_partial_result()``, and ``get_final_result()`` functions. + +* ``reset()`` resets the aggregation state. +* ``eval()`` evaluates input tuples in the first stage. +* ``merge()`` merges intermediate results of the first stage. +* ``get_partial_result()`` returns intermediate results of the first stage. Output type must be same with the input type of ``merge()``. +* ``get_final_result()`` returns the final aggregation result. + +----------------------- +Query example +----------------------- + +Once the Python UDAFs are successfully registered, you can use them as other built-in aggregation functions. + +.. code-block:: sql + + default> select avgpy(n_nationkey), countpy() from nation; + +.. warning:: + + Currently, Python UDAFs cannot be used as window functions. \ No newline at end of file
Added: tajo/site/docs/0.11.0/_sources/functions/string_func_and_operators.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/functions/string_func_and_operators.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/functions/string_func_and_operators.txt (added) +++ tajo/site/docs/0.11.0/_sources/functions/string_func_and_operators.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,431 @@ +******************************* +String Functions and Operators +******************************* + +.. function:: str1 || str2 + + Returns the concatnenated string of both side strings ``str1`` and ``str2``. + + :param str1: first string + :param str2: second string + :rtype: text + :example: + + .. code-block:: sql + + select 'Ta' || 'jo'; + > 'Tajo' + +.. function:: ascii (string text) + + Returns the ASCII code of the first character of the text. + For UTF-8, this function returns the Unicode code point of the character. + For other multibyte encodings, the argument must be an ASCII character. + + :param string: input string + :rtype: int4 + :example: + + .. code-block:: sql + + select ascii('x'); + > 120 + +.. function:: bit_length (string text) + + Returns the number of bits in string. + + :param string: input string + :rtype: int4 + :example: + + .. code-block:: sql + + select bit_length('jose'); + > 32 + +.. function:: char_length (string text) + + Returns the number of characters in string. + + :param string: to be counted + :rtype: int4 + :alias: character_length, length + :example: + + .. code-block:: sql + + select char_length('Tajo'); + > 4 + +.. function:: octet_length (string text) + + Returns the number of bytes in string. + + :param string: input string + :rtype: int4 + :example: + + .. code-block:: sql + + select octet_length('jose'); + > 4 + +.. function:: chr (code int4) + + Returns a character with the given code. + + :param code: input character code + :rtype: char + :example: + + .. code-block:: sql + + select chr(65); + > A + +.. function:: decode (binary text, format text) + + Decode binary data from textual representation in string. + + :param binary: encoded value + :param format: decode format. base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes. + :rtype: text + :example: + + .. code-block:: sql + + select decode('MTIzXDAwMFwwMDE=', 'base64'); + > 123\\000\\001 + +.. function:: digest (input text, method text) + + Calculates the Digest hash of string. + + :param input: input string + :param method: hash method + :rtype: text + :example: + + .. code-block:: sql + + select digest('tajo', 'sha1'); + > 02b0e20540b89f0b735092bbac8093eb2e3804cf + +.. function:: encode (binary text, format text) + + Encode binary data into a textual representation. + + :param binary: decoded value + :param format: encode format. base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes. + :rtype: text + :example: + + .. code-block:: sql + + select encode('123\\000\\001', 'base64'); + > MTIzXDAwMFwwMDE= + +.. function:: initcap (string text) + + Convert the first letter of each word to upper case and the rest to lower case. + + :param string: input string + :rtype: text + :example: + + .. code-block:: sql + + select initcap('hi THOMAS'); + > Hi Thomas + +.. function:: md5 (string text) + + Calculates the MD5 hash of string. + + :param string: input string + :rtype: text + :example: + + .. code-block:: sql + + select md5('abc'); + > 900150983cd24fb0d6963f7d28e17f72 + +.. function:: left (string text, number int4) + + Returns the first n characters in the string. + + :param string: input string + :param number: number of characters retrieved + :rtype: text + :example: + + .. code-block:: sql + + select left('ABC', 2); + > AB + +.. function:: right(string text, number int4) + + Returns the last n characters in the string. + + :param string: input string + :param number: number of characters retrieved + :rtype: text + :example: + + .. code-block:: sql + + select right('ABC', 2); + > BC + +.. function:: locate(source text, target text, [start_index int4]) + + Returns the location of specified substring. + + :param source: source string + :param target: target substring + :param start_index: the index where the search is started + :rtype: int4 + :alias: strpos + :example: + + .. code-block:: sql + + select locate('high', 'ig', 1); + > 2 + +.. function:: strposb(source text, target text) + + Returns the binary location of specified substring. + + :param source: source string + :param target: target substring + :rtype: int4 + :example: + + .. code-block:: sql + + select strpos('tajo', 'aj'); + > 2 + +.. function:: substr(source text, start int4, length int4) + + Extract substring. + + :param source: source string + :param start: start index + :param length: length of substring + :rtype: text + :example: + + .. code-block:: sql + + select substr('alphabet', 3, 2); + > ph + +.. function:: trim(string text, [characters text]) + + Removes the characters (a space by default) from the start/end/both ends of the string. + + :param string: input string + :param characters: characters which will be removed + :rtype: text + :example: + + .. code-block:: sql + + select trim('xTajoxx', 'x'); + > Tajo + +.. function:: trim([leading | trailing | both] [characters text] FROM string text) + + Removes the characters (a space by default) from the start/end/both ends of the string. + + :param string: input string + :param characters: characters which will be removed + :rtype: text + :example: + + .. code-block:: sql + + select trim(both 'x' from 'xTajoxx'); + > Tajo + + +.. function:: btrim(string text, [characters text]) + + Removes the characters (a space by default) from the both ends of the string. + + :param string: input string + :param characters: characters which will be removed + :rtype: text + :alias: trim + :example: + + .. code-block:: sql + + select btrim('xTajoxx', 'x'); + > Tajo + + +.. function:: ltrim(string text, [characters text]) + + Removes the characters (a space by default) from the start ends of the string. + + :param string: input string + :param characters: characters which will be removed + :rtype: text + :example: + + .. code-block:: sql + + select ltrim('xxTajo', 'x'); + > Tajo + + +.. function:: rtrim(string text, [characters text]) + + Removes the characters (a space by default) from the end ends of the string. + + :param string: input string + :param characters: characters which will be removed + :rtype: text + :example: + + .. code-block:: sql + + select rtrim('Tajoxx', 'x'); + > Tajo + + +.. function:: split_part(string text, delimiter text, field int) + + Splits a string on delimiter and return the given field (counting from one). + + :param string: input string + :param delimiter: delimiter + :param field: index to field + :rtype: text + :example: + + .. code-block:: sql + + select split_part('ab_bc_cd','_',2); + > bc + + + +.. function:: regexp_replace(string text, pattern text, replacement text) + + Replaces substrings matched to a given regular expression pattern. + + :param string: input string + :param pattern: pattern + :param replacement: string substituted for the matching substring + :rtype: text + :example: + + .. code-block:: sql + + select regexp_replace('abcdef', '(Ëab|ef$)', 'â'); + > âcdâ + + +.. function:: upper(string text) + + Makes an input text to be upper case. + + :param string: input string + :rtype: text + :example: + + .. code-block:: sql + + select upper('tajo'); + > TAJO + + +.. function:: lower(string text) + + Makes an input text to be lower case. + + :param string: input string + :rtype: text + :example: + + .. code-block:: sql + + select lower('TAJO'); + > tajo + +.. function:: lpad(source text, number int4, pad text) + + Fill up the string to length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right). + + :param source: source string + :param number: padding length + :param pad: padding string + :rtype: text + :example: + + .. code-block:: sql + + select lpad('hi', 5, 'xy'); + > xyxhi + +.. function:: rpad(source text, number int4, pad text) + + Fill up the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated. + + :param source: source string + :param number: padding length + :param pad: padding string + :rtype: text + :example: + + .. code-block:: sql + + select rpad('hi', 5, 'xy'); + > hixyx + +.. function:: quote_ident(string text) + + Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. + + :param string: input string + :rtype: text + :example: + + .. code-block:: sql + + select quote_ident('Foo bar'); + > "Foo bar" + +.. function:: repeat(string text, number int4) + + Repeat string the specified number of times. + + :param string: input string + :param number: repetition number + :rtype: text + :example: + + .. code-block:: sql + + select repeat('Pg', 4); + > PgPgPgPg + +.. function:: reverse(string text) + + Reverse string. + + :param string: input string + :rtype: text + :example: + + .. code-block:: sql + + select reverse('TAJO'); + > OJAT \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/functions/window_func.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/functions/window_func.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/functions/window_func.txt (added) +++ tajo/site/docs/0.11.0/_sources/functions/window_func.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,47 @@ +************************************ +Window Functions +************************************ + +.. function:: first_value (value any) + + Returns the first value of input rows. + + :param value: input value + :rtype: same as parameter data type + +.. function:: last_value (value any) + + Returns the last value of input rows. + + :param value: input value + :rtype: same as parameter data type + +.. function:: lag (value any [, offset integer [, default any ]]) + + Returns value evaluated at the row that is offset rows before the current row within the partition. If there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null. + + :param value: input value + :param offset: offset + :param default: default value + :rtype: same as parameter data type + +.. function:: lead (value any [, offset integer [, default any ]]) + + Returns value evaluated at the row that is offset rows after the current row within the partition. If there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null. + + :param value: input value + :param offset: offset + :param default: default value + :rtype: same as parameter data type + +.. function:: rank () + + Returns rank of the current row with gaps. + + :rtype: int8 + +.. function:: row_number () + + Returns the current row within its partition, counting from 1. + + :rtype: int8 \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/getting_started.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/getting_started.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/getting_started.txt (added) +++ tajo/site/docs/0.11.0/_sources/getting_started.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,182 @@ +*************** +Getting Started +*************** + +In this section, we explain setup of a standalone Tajo instance. It will run against the local filesystem. In later sections, we will present how to run Tajo cluster instance on Apache Hadoop's HDFS, a distributed filesystem. This section shows you how to start up a Tajo cluster, create tables in your Tajo cluster, submit SQL queries via Tajo shell, and shutting down your Tajo cluster instance. The below exercise should take no more than ten minutes. + +====================== +Prerequisites +====================== + + * Hadoop 2.3.0 or higher (up to 2.6.0) + * Java 1.7 or higher + * Protocol buffer 2.5.0 + +=================================== +Dowload and unpack the source code +=================================== + +You can either download the source code release of Tajo or check out the development codebase from Git. + +----------------------------------- +Download the latest source release +----------------------------------- + +Choose a download site from this list of `Apache Download Mirrors <http://www.apache.org/dyn/closer.cgi/tajo>`_. +Click on the suggested mirror link. This will take you to a mirror of Tajo Releases. +Download the file that ends in .tar.gz to your local filesystem, e.g. tajo-x.y.z-src.tar.gz. + +Decompress and untar your downloaded file and then change into the unpacked directory. :: + + tar xzvf tajo-x.y.z-src.tar.gz + +----------------------------------- +Check out the source code via Git +----------------------------------- + +The development codebase can also be downloaded from `the Apache git repository <https://git-wip-us.apache.org/repos/asf/tajo.git>`_ as follows: :: + + git clone https://git-wip-us.apache.org/repos/asf/tajo.git + +A read-only git repository is also mirrored on `Github <https://github.com/apache/tajo>`_. + + +================= +Build source code +================= + +You prepare the prerequisites and the source code, you can build the source code now. + +The first step of the installation procedure is to configure the source tree for your system and choose the options you would like. This is done by running the configure script. For a default installation simply enter: + +You can compile source code and get a binary archive as follows: + +.. code-block:: bash + + $ cd tajo-x.y.z + $ mvn clean install -DskipTests -Pdist -Dtar -Dhadoop.version=2.X.X + $ ls tajo-dist/target/tajo-x.y.z-SNAPSHOT.tar.gz + +.. note:: + + If you don't specify the hadoop version, tajo cluster may not run correctly. Thus, we highly recommend that you specify your hadoop version with maven build command. + + Example: + + $ mvn clean install -DskipTests -Pdist -Dtar -Dhadoop.version=2.5.1 + +Then, after you move some proper directory, discompress the tar.gz file as follows: + +.. code-block:: bash + + $ cd [a directory to be parent of tajo binary] + $ tar xzvf ${TAJO_SRC}/tajo-dist/target/tajo-x.y.z-SNAPSHOT.tar.gz + +================================ +Setting up a local Tajo cluster +================================ + +Apache Tajo⢠provides two run modes: local mode and fully distributed mode. Here, we explain only the local mode where a Tajo instance runs on a local file system. A local mode Tajo instance can start up with very simple configurations. + +First of all, you need to add the environment variables to conf/tajo-env.sh. + +.. code-block:: bash + + # Hadoop home. Required + export HADOOP_HOME= ... + + # The java implementation to use. Required. + export JAVA_HOME= ... + +To launch the tajo master, execute start-tajo.sh. + +.. code-block:: bash + + $ $TAJO_HOME/bin/start-tajo.sh + +.. note:: + + If you want to how to setup a fully distributed mode of Tajo, please see :doc:`/configuration/cluster_setup`. + +.. warning:: + + By default, *Catalog server* which manages table meta data uses `Apache Derby <http://db.apache.org/derby/>`_ as a persistent storage, and Derby stores data into ``/tmp/tajo-catalog-${username}`` directory. But, some operating systems may remove all contents in ``/tmp`` when booting up. In order to ensure persistent store of your catalog data, you need to set a proper location of derby directory. To learn Catalog configuration, please refer to :doc:`/configuration/catalog_configuration`. + +====================== +First query execution +====================== + +First of all, we need to prepare some table for query execution. For example, you can make a simple text-based table as follows: + +.. code-block:: bash + + $ mkdir /home/x/table1 + $ cd /home/x/table1 + $ cat > data.csv + 1|abc|1.1|a + 2|def|2.3|b + 3|ghi|3.4|c + 4|jkl|4.5|d + 5|mno|5.6|e + <CTRL + D> + + +Apache Tajo⢠provides a SQL shell which allows users to interactively submit SQL queries. In order to use this shell, please execute ``bin/tsql`` :: + + $ $TAJO_HOME/bin/tsql + tajo> + +In order to load the table we created above, we should think of a schema of the table. +Here, we assume the schema as (int, text, float, text). :: + + $ $TAJO_HOME/bin/tsql + tajo> create external table table1 ( + id int, + name text, + score float, + type text) + using text with ('text.delimiter'='|') location 'file:/home/x/table1'; + +To load an external table, you need to use âcreate external tableâ statement. +In the location clause, you should use the absolute directory path with an appropriate scheme. +If the table resides in HDFS, you should use âhdfsâ instead of âfileâ. + +If you want to know DDL statements in more detail, please see Query Language. :: + + tajo> \d + table1 + + ``\d`` command shows the list of tables. :: + + tajo> \d table1 + + table name: table1 + table path: file:/home/x/table1 + store type: TEXT + number of rows: 0 + volume (bytes): 78 B + schema: + id INT + name TEXT + score FLOAT + type TEXT + +``\d [table name]`` command shows the description of a given table. + +Also, you can execute SQL queries as follows: :: + + tajo> select * from table1 where id > 2; + final state: QUERY_SUCCEEDED, init time: 0.069 sec, response time: 0.397 sec + result: file:/tmp/tajo-hadoop/staging/q_1363768615503_0001_000001/RESULT, 3 rows ( 35B) + + id, name, score, type + - - - - - - - - - - - - - + 3, ghi, 3.4, c + 4, jkl, 4.5, d + 5, mno, 5.6, e + + tajo> \q + bye + +Feel free to enjoy Tajo with SQL standards. +If you want to know more explanation for SQL supported by Tajo, please refer :doc:`/sql_language`. \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/getting_started/building.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/getting_started/building.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/getting_started/building.txt (added) +++ tajo/site/docs/0.11.0/_sources/getting_started/building.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,30 @@ +***************** +Build source code +***************** + +You prepare the prerequisites and the source code, you can build the source code now. + +The first step of the installation procedure is to configure the source tree for your system and choose the options you would like. This is done by running the configure script. For a default installation simply enter: + +You can compile source code and get a binary archive as follows: + +.. code-block:: bash + + $ cd tajo-x.y.z + $ mvn clean install -DskipTests -Pdist -Dtar -Dhadoop.version=2.X.X + $ ls tajo-dist/target/tajo-x.y.z-SNAPSHOT.tar.gz + +.. note:: + + If you don't specify the hadoop version, tajo cluster may not run correctly. Thus, we highly recommend that you specify your hadoop version with maven build command. + + Example: + + $ mvn clean install -DskipTests -Pdist -Dtar -Dhadoop.version=2.5.1 + +Then, after you move some proper directory, discompress the tar.gz file as follows: + +.. code-block:: bash + + $ cd [a directory to be parent of tajo binary] + $ tar xzvf ${TAJO_SRC}/tajo-dist/target/tajo-x.y.z-SNAPSHOT.tar.gz \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/getting_started/downloading_source.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/getting_started/downloading_source.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/getting_started/downloading_source.txt (added) +++ tajo/site/docs/0.11.0/_sources/getting_started/downloading_source.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,31 @@ +************************************* +Dowload and unpack the source code +************************************* + +You can either download the source code release of Tajo or check out the development codebase from Git. + +================================================ +Download the latest source release +================================================ + +Choose a download site from this list of `Apache Download Mirrors <http://www.apache.org/dyn/closer.cgi/tajo>`_. +Click on the suggested mirror link. This will take you to a mirror of Tajo Releases. +Download the file that ends in .tar.gz to your local filesystem, e.g. tajo-x.y.z-src.tar.gz. + +Decompress and untar your downloaded file and then change into the unpacked directory. :: + + tar xzvf tajo-x.y.z-src.tar.gz + +================================================ +Check out the source code via Git +================================================ + +The development codebase can also be downloaded from `the Apache git repository <https://git-wip-us.apache.org/repos/asf/tajo.git>`_ as follows: :: + + git clone https://git-wip-us.apache.org/repos/asf/tajo.git + +A read-only git repository is also mirrored on `Github <https://github.com/apache/tajo>`_. + + + + Added: tajo/site/docs/0.11.0/_sources/getting_started/first_query.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/getting_started/first_query.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/getting_started/first_query.txt (added) +++ tajo/site/docs/0.11.0/_sources/getting_started/first_query.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,78 @@ +************************ +First query execution +************************ + +First of all, we need to prepare some table for query execution. For example, you can make a simple text-based table as follows: + +.. code-block:: bash + + $ mkdir /home/x/table1 + $ cd /home/x/table1 + $ cat > data.csv + 1|abc|1.1|a + 2|def|2.3|b + 3|ghi|3.4|c + 4|jkl|4.5|d + 5|mno|5.6|e + <CTRL + D> + + +Apache Tajo⢠provides a SQL shell which allows users to interactively submit SQL queries. In order to use this shell, please execute ``bin/tsql`` :: + + $ $TAJO_HOME/bin/tsql + tajo> + +In order to load the table we created above, we should think of a schema of the table. +Here, we assume the schema as (int, text, float, text). :: + + $ $TAJO_HOME/bin/tsql + tajo> create external table table1 ( + id int, + name text, + score float, + type text) + using csv with ('text.delimiter'='|') location 'file:/home/x/table1'; + +To load an external table, you need to use âcreate external tableâ statement. +In the location clause, you should use the absolute directory path with an appropriate scheme. +If the table resides in HDFS, you should use âhdfsâ instead of âfileâ. + +If you want to know DDL statements in more detail, please see Query Language. :: + + tajo> \d + table1 + + ``\d`` command shows the list of tables. :: + + tajo> \d table1 + + table name: table1 + table path: file:/home/x/table1 + store type: CSV + number of rows: 0 + volume (bytes): 78 B + schema: + id INT + name TEXT + score FLOAT + type TEXT + +``\d [table name]`` command shows the description of a given table. + +Also, you can execute SQL queries as follows: :: + + tajo> select * from table1 where id > 2; + final state: QUERY_SUCCEEDED, init time: 0.069 sec, response time: 0.397 sec + result: file:/tmp/tajo-hadoop/staging/q_1363768615503_0001_000001/RESULT, 3 rows ( 35B) + + id, name, score, type + - - - - - - - - - - - - - + 3, ghi, 3.4, c + 4, jkl, 4.5, d + 5, mno, 5.6, e + + tajo> \q + bye + +Feel free to enjoy Tajo with SQL standards. +If you want to know more explanation for SQL supported by Tajo, please refer :doc:`/sql_language`. \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/getting_started/local_setup.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/getting_started/local_setup.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/getting_started/local_setup.txt (added) +++ tajo/site/docs/0.11.0/_sources/getting_started/local_setup.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,31 @@ +********************************** +Setting up a local Tajo cluster +********************************** + +Apache Tajo⢠provides two run modes: local mode and fully distributed mode. Here, we explain only the local mode where a Tajo instance runs on a local file system. A local mode Tajo instance can start up with very simple configurations. + +First of all, you need to add the environment variables to conf/tajo-env.sh. + +.. code-block:: bash + + # Hadoop home. Required + export HADOOP_HOME= ... + + # The java implementation to use. Required. + export JAVA_HOME= ... + +To launch the tajo master, execute start-tajo.sh. + +.. code-block:: bash + + $ $TAJO_HOME/bin/start-tajo.sh + +.. note:: + + If you want to how to setup a fully distributed mode of Tajo, please see :doc:`/configuration/cluster_setup`. + +.. warning:: + + By default, *Catalog server* which manages table meta data uses `Apache Derby <http://db.apache.org/derby/>`_ as a persistent storage, and Derby stores data into ``/tmp/tajo-catalog-${username}`` directory. But, some operating systems may remove all contents in ``/tmp`` when booting up. In order to ensure persistent store of your catalog data, you need to set a proper location of derby directory. To learn Catalog configuration, please refer to :doc:`/configuration/catalog_configuration`. + + Added: tajo/site/docs/0.11.0/_sources/getting_started/prerequisites.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/getting_started/prerequisites.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/getting_started/prerequisites.txt (added) +++ tajo/site/docs/0.11.0/_sources/getting_started/prerequisites.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,7 @@ +********************** +Prerequisites +********************** + + * Hadoop 2.3.0 or higher (up to 2.5.1) + * Java 1.6 or 1.7 + * Protocol buffer 2.5.0 Added: tajo/site/docs/0.11.0/_sources/hbase_integration.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/hbase_integration.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/hbase_integration.txt (added) +++ tajo/site/docs/0.11.0/_sources/hbase_integration.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,183 @@ +************************************* +HBase Integration +************************************* + +Apache Tajo⢠storage supports integration with Apache HBaseâ¢. +This integration allows Tajo to access all tables used in Apache HBase. + +In order to use this feature, you need to build add some configs into ``conf/tajo-env.sh`` and then add some properties into a table create statement. + +This section describes how to setup HBase integration. + +First, you need to set your HBase home directory to the environment variable ``HBASE_HOME`` in conf/tajo-env.sh as follows: :: + + export HBASE_HOME=/path/to/your/hbase/directory + +If you set the directory, Tajo will add HBase library file to classpath. + + + +======================== +CREATE TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <table_name> [(<column_name> <data_type>, ... )] + USING hbase + WITH ('table'='<hbase_table_name>' + , 'columns'=':key,<column_family_name>:<qualifier_name>, ...' + , 'hbase.zookeeper.quorum'='<zookeeper_address>' + , 'hbase.zookeeper.property.clientPort'='<zookeeper_client_port>' + ) + +Options + +* ``table`` : Set hbase origin table name. If you want to create an external table, the table must exists on HBase. The other way, if you want to create a managed table, the table must doesn't exist on HBase. +* ``columns`` : :key means HBase row key. The number of columns entry need to equals to the number of Tajo table column +* ``hbase.zookeeper.quorum`` : Set zookeeper quorum address. You can use different zookeeper cluster on the same Tajo database. If you don't set the zookeeper address, Tajo will refer the property of hbase-site.xml file. +* ``hbase.zookeeper.property.clientPort`` : Set zookeeper client port. If you don't set the port, Tajo will refer the property of hbase-site.xml file. + +``IF NOT EXISTS`` allows ``CREATE [EXTERNAL] TABLE`` statement to avoid an error which occurs when the table does not exist. + + + +======================== + DROP TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + DROP TABLE [IF EXISTS] <table_name> [PURGE] + +``IF EXISTS`` allows ``DROP TABLE`` statement to avoid an error which occurs when the table does not exist. ``DROP TABLE`` statement removes a table from Tajo catalog, but it does not remove the contents on HBase cluster. If ``PURGE`` option is given, ``DROP TABLE`` statement will eliminate the entry in the catalog as well as the contents on HBase cluster. + + +======================== +INSERT (OVERWRITE) INTO +======================== + +INSERT OVERWRITE statement overwrites a table data of an existing table. Tajo's INSERT OVERWRITE statement follows ``INSERT INTO SELECT`` statement of SQL. The examples are as follows: + +.. code-block:: sql + + -- when a target table schema and output schema are equivalent to each other + INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem; + -- or + INSERT OVERWRITE INTO t1 SELECT * FROM lineitem; + + -- when the output schema are smaller than the target table schema + INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem; + + -- when you want to specify certain target columns + INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem; + + +.. note:: + + If you don't set row key option, You are never able to use your table data. Because Tajo need to have some key columns for sorting before creating result data. + + + +======================== +Usage +======================== + +In order to create a new HBase table which is to be managed by Tajo, use the USING clause on CREATE TABLE: + +.. code-block:: sql + + CREATE EXTERNAL TABLE blog (rowkey text, author text, register_date text, title text) + USING hbase WITH ( + 'table'='blog' + , 'columns'=':key,info:author,info:date,content:title'); + +After executing the command above, you should be able to see the new table in the HBase shell: + +.. code-block:: sql + + $ hbase shell + create 'blog', {NAME=>'info'}, {NAME=>'content'} + put 'blog', 'hyunsik-02', 'content:title', 'Getting started with Tajo on your desktop' + put 'blog', 'hyunsik-02', 'info:author', 'Hyunsik Choi' + put 'blog', 'hyunsik-02', 'info:date', '2014-12-03' + put 'blog', 'blrunner-01', 'content:title', 'Apache Tajo: A Big Data Warehouse System on Hadoop' + put 'blog', 'blrunner-01', 'info:author', 'Jaehwa Jung' + put 'blog', 'blrunner-01', 'info:date', '2014-10-31' + put 'blog', 'jhkim-01', 'content:title', 'APACHE TAJO⢠v0.9 HAS ARRIVED!' + put 'blog', 'jhkim-01', 'info:author', 'Jinho Kim' + put 'blog', 'jhkim-01', 'info:date', '2014-10-22' + +And then create the table and query the table meta data with ``\d`` option: + +.. code-block:: sql + + default> \d blog; + + table name: default.blog + table path: + store type: HBASE + number of rows: unknown + volume: 0 B + Options: + 'columns'=':key,info:author,info:date,content:title' + 'table'='blog' + + schema: + rowkey TEXT + author TEXT + register_date TEXT + title TEXT + + +And then query the table as follows: + +.. code-block:: sql + + default> SELECT * FROM blog; + rowkey, author, register_date, title + ------------------------------- + blrunner-01, Jaehwa Jung, 2014-10-31, Apache Tajo: A Big Data Warehouse System on Hadoop + hyunsik-02, Hyunsik Choi, 2014-12-03, Getting started with Tajo on your desktop + jhkim-01, Jinho Kim, 2014-10-22, APACHE TAJO⢠v0.9 HAS ARRIVED! + + default> SELECT * FROM blog WHERE rowkey = 'blrunner-01'; + Progress: 100%, response time: 2.043 sec + rowkey, author, register_date, title + ------------------------------- + blrunner-01, Jaehwa Jung, 2014-10-31, Apache Tajo: A Big Data Warehouse System on Hadoop + + +Here's how to insert data the HBase table: + +.. code-block:: sql + + CREATE TABLE blog_backup(rowkey text, author text, register_date text, title text) + USING hbase WITH ( + 'table'='blog_backup' + , 'columns'=':key,info:author,info:date,content:title'); + INSERT OVERWRITE INTO blog_backup SELECT * FROM blog; + + +Use HBase shell to verify that the data actually got loaded: + +.. code-block:: sql + + hbase(main):004:0> scan 'blog_backup' + ROW COLUMN+CELL + blrunner-01 column=content:title, timestamp=1421227531054, value=Apache Tajo: A Big Data Warehouse System on Hadoop + blrunner-01 column=info:author, timestamp=1421227531054, value=Jaehwa Jung + blrunner-01 column=info:date, timestamp=1421227531054, value=2014-10-31 + hyunsik-02 column=content:title, timestamp=1421227531054, value=Getting started with Tajo on your desktop + hyunsik-02 column=info:author, timestamp=1421227531054, value=Hyunsik Choi + hyunsik-02 column=info:date, timestamp=1421227531054, value=2014-12-03 + jhkim-01 column=content:title, timestamp=1421227531054, value=APACHE TAJO\xE2\x84\xA2 v0.9 HAS ARRIVED! + jhkim-01 column=info:author, timestamp=1421227531054, value=Jinho Kim + jhkim-01 column=info:date, timestamp=1421227531054, value=2014-10-22 + 3 row(s) in 0.0470 seconds + + Added: tajo/site/docs/0.11.0/_sources/hcatalog_integration.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/hcatalog_integration.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/hcatalog_integration.txt (added) +++ tajo/site/docs/0.11.0/_sources/hcatalog_integration.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,52 @@ +************************************* +HCatalog Integration +************************************* + +Apache Tajo⢠catalog supports HCatalogStore driver to integrate with Apache Hiveâ¢. +This integration allows Tajo to access all tables used in Apache Hive. +Depending on your purpose, you can execute either SQL queries or HiveQL queries on the +same tables managed in Apache Hive. + +In order to use this feature, you need to build Tajo with a specified maven profile +and then add some configs into ``conf/tajo-env.sh`` and ``conf/catalog-site.xml``. +This section describes how to setup HCatalog integration. +This instruction would take no more than ten minutes. + +First, you need to compile the source code with hcatalog profile. +Currently, Tajo supports hcatalog-0.11.0 and hcatalog-0.12.0 profile. +So, if you want to use Hive 0.11.0, you need to set ``-Phcatalog-0.11.0`` as the maven profile :: + + $ mvn clean package -DskipTests -Pdist -Dtar -Phcatalog-0.11.0 + +Or, if you want to use Hive 0.12.0, you need to set ``-Phcatalog-0.12.0`` as the maven profile :: + + $ mvn clean package -DskipTests -Pdist -Dtar -Phcatalog-0.12.0 + +Then, you need to set your Hive home directory to the environment variable ``HIVE_HOME`` in conf/tajo-env.sh as follows: :: + + export HIVE_HOME=/path/to/your/hive/directory + +If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver. +Next, you should set the path of MySQL JDBC driver jar file to the environment variable HIVE_JDBC_DRIVER_DIR in conf/tajo-env.sh as follows: :: + + export HIVE_JDBC_DRIVER_DIR==/path/to/your/mysql_jdbc_driver/mysql-connector-java-x.x.x-bin.jar + +Finally, you should specify HCatalogStore as Tajo catalog driver class in ``conf/catalog-site.xml`` as follows: :: + + <property> + <name>tajo.catalog.store.class</name> + <value>org.apache.tajo.catalog.store.HCatalogStore</value> + </property> + +.. note:: + + Hive stores a list of partitions for each table in its metastore. If new partitions are + directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user + ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or + ``MSCK REPAIR TABLE table_name`` command. + + But current tajo doesn't provide ``ADD PARTITION`` command and hive doesn't provide an api for + responding to ``MSK REPAIR TABLE`` command. Thus, if you insert data to hive partitioned + table and you want to scan the updated partitions through Tajo, you must run following command on hive :: + + $ MSCK REPAIR TABLE [table_name]; Added: tajo/site/docs/0.11.0/_sources/hive_integration.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/hive_integration.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/hive_integration.txt (added) +++ tajo/site/docs/0.11.0/_sources/hive_integration.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,42 @@ +************************************* +Hive Integration +************************************* + +Apache Tajo⢠catalog supports HiveCatalogStore to integrate with Apache Hiveâ¢. +This integration allows Tajo to access all tables used in Apache Hive. +Depending on your purpose, you can execute either SQL queries or HiveQL queries on the +same tables managed in Apache Hive. + +In order to use this feature, you need to build Tajo with a specified maven profile +and then add some configs into ``conf/tajo-env.sh`` and ``conf/catalog-site.xml``. +This section describes how to setup HiveMetaStore integration. +This instruction would take no more than five minutes. + +You need to set your Hive home directory to the environment variable ``HIVE_HOME`` in conf/tajo-env.sh as follows: :: + + export HIVE_HOME=/path/to/your/hive/directory + +If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver. +Next, you should set the path of MySQL JDBC driver jar file to the environment variable HIVE_JDBC_DRIVER_DIR in conf/tajo-env.sh as follows: :: + + export HIVE_JDBC_DRIVER_DIR==/path/to/your/mysql_jdbc_driver/mysql-connector-java-x.x.x-bin.jar + +Finally, you should specify HiveCatalogStore as Tajo catalog driver class in ``conf/catalog-site.xml`` as follows: :: + + <property> + <name>tajo.catalog.store.class</name> + <value>org.apache.tajo.catalog.store.HiveCatalogStore</value> + </property> + +.. note:: + + Hive stores a list of partitions for each table in its metastore. If new partitions are + directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user + ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or + ``MSCK REPAIR TABLE table_name`` command. + + But current tajo doesn't provide ``ADD PARTITION`` command and hive doesn't provide an api for + responding to ``MSK REPAIR TABLE`` command. Thus, if you insert data to hive partitioned + table and you want to scan the updated partitions through Tajo, you must run following command on hive :: + + $ MSCK REPAIR TABLE [table_name]; Added: tajo/site/docs/0.11.0/_sources/index.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/index.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/index.txt (added) +++ tajo/site/docs/0.11.0/_sources/index.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,51 @@ +.. Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. Apache Tajo documentation master file, created by + sphinx-quickstart on Thu Feb 27 08:29:11 2014. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +Apache Tajo⢠(0.11.0 Release) - User documentation +=========================================================================== + +Table of Contents: + +.. toctree:: + :maxdepth: 3 + + introduction + getting_started + configuration + tsql + sql_language + time_zone + functions + table_management + table_partitioning + storage_plugins + index_overview + backup_and_restore + hive_integration + hbase_integration + swift_integration + jdbc_driver + tajo_client_api + faq + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` + Added: tajo/site/docs/0.11.0/_sources/index/future_work.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/index/future_work.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/index/future_work.txt (added) +++ tajo/site/docs/0.11.0/_sources/index/future_work.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,8 @@ +************************************* +Future Works +************************************* + +* Providing more index types, such as bitmap and HBase index +* Supporting index on partitioned tables +* Supporting the backup and restore feature +* Cost-based query optimization by estimating the query selectivity \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/index/how_to_use.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/index/how_to_use.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/index/how_to_use.txt (added) +++ tajo/site/docs/0.11.0/_sources/index/how_to_use.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,69 @@ +************************************* +How to use index? +************************************* + +------------------------------------- +1. Create index +------------------------------------- + +The first step for utilizing index is index creation. You can create index using SQL (:doc:`/sql_language/ddl`) or Tajo API (:doc:`/tajo_client_api`). For example, you can create a BST index on the lineitem table by submitting the following SQL to Tajo. + +.. code-block:: sql + + default> create index l_orderkey_idx on lineitem (l_orderkey); + +If the index is created successfully, you can see the information about that index as follows: :: + + default> \d lineitem + + table name: default.lineitem + table path: hdfs://localhost:7020/tpch/lineitem + store type: TEXT + number of rows: unknown + volume: 753.9 MB + Options: + 'text.delimiter'='|' + + schema: + l_orderkey INT8 + l_partkey INT8 + l_suppkey INT8 + l_linenumber INT8 + l_quantity FLOAT4 + l_extendedprice FLOAT4 + l_discount FLOAT4 + l_tax FLOAT4 + l_returnflag TEXT + l_linestatus TEXT + l_shipdate DATE + l_commitdate DATE + l_receiptdate DATE + l_shipinstruct TEXT + l_shipmode TEXT + l_comment TEXT + + + Indexes: + "l_orderkey_idx" TWO_LEVEL_BIN_TREE (l_orderkey ASC NULLS LAST ) + +For more information about index creation, please refer to the above links. + +------------------------------------- +2. Enable/disable index scans +------------------------------------- + +When an index is successfully created, you must enable the index scan feature as follows: + +.. code-block:: sql + + default> \set INDEX_ENABLED true + +If you don't want to use the index scan feature anymore, you can simply disable it as follows: + +.. code-block:: sql + + default> \set INDEX_ENABLED false + +.. note:: + + Once the index scan feature is enabled, Tajo currently always performs the index scan regardless of its efficiency. You should set this option when the expected number of retrieved tuples is sufficiently small. Added: tajo/site/docs/0.11.0/_sources/index/types.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/index/types.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/index/types.txt (added) +++ tajo/site/docs/0.11.0/_sources/index/types.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,7 @@ +************************************* +Index Types +************************************* + +Currently, Tajo supports only one type of index, ``TWO_LEVEL_BIN_TREE``, shortly ``BST``. The BST index is a kind of binary search tree which is extended to be permanently stored on disk. It consists of two levels of nodes; a leaf node indexes the keys with the positions of data in an HDFS block and a root node indexes the keys with the leaf node indices. + +When an index scan is started, the query engine first reads the root node and finds the search key. If it finds a leaf node corresponding to the search key, it subsequently finds the search key in that leaf node. Finally, it directly reads a tuple corresponding to the search key from HDFS. \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/index_overview.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/index_overview.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/index_overview.txt (added) +++ tajo/site/docs/0.11.0/_sources/index_overview.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,20 @@ +***************************** +Index (Experimental Feature) +***************************** + +An index is a data structure that is used for efficient query processing. Using an index, the Tajo query engine can directly retrieve search values. + +This is still an experimental feature. In order to use indexes, you must check out the source code of the ``index_support`` branch:: + + git clone -b index_support https://git-wip-us.apache.org/repos/asf/tajo.git tajo-index + +For the source code build, please refer to :doc:`getting_started`. + +The following sections describe the supported index types, the query execution with an index, and the future works. + +.. toctree:: + :maxdepth: 1 + + index/types + index/how_to_use + index/future_work \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/introduction.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/introduction.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/introduction.txt (added) +++ tajo/site/docs/0.11.0/_sources/introduction.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,13 @@ +*************** +Introduction +*************** + +The main goal of Apache Tajo project is to build an advanced open source +data warehouse system in Hadoop for processing web-scale data sets. +Basically, Tajo provides SQL standard as a query language. +Tajo is designed for both interactive and batch queries on data sets +stored on HDFS and other data sources. Without hurting query response +times, Tajo provides fault-tolerance and dynamic load balancing which +are necessary for long-running queries. Tajo employs a cost-based and +progressive query optimization techniques for reoptimizing running +queries in order to avoid the worst query plans. \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/jdbc_driver.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/jdbc_driver.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/jdbc_driver.txt (added) +++ tajo/site/docs/0.11.0/_sources/jdbc_driver.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,140 @@ +************************************* +Tajo JDBC Driver +************************************* + +Apache Tajo⢠provides JDBC driver +which enables Java applciations to easily access Apache Tajo in a RDBMS-like manner. +In this section, we explain how to get JDBC driver and an example code. + +How to get JDBC driver +======================= + +Direct Download +-------------------------------- + +You can directly download a JDBC driver jar file (``tajo-jdbc-x.y.z.jar``) from `Downloads <http://tajo.apache.org/downloads.html>`_. + +From Binary Distribution +-------------------------------- + +Tajo binary distribution provides JDBC jar file located in ``${TAJO_HOME}/share/jdbc-dist/tajo-jdbc-x.y.z.jar``. + + +From Building Source Code +-------------------------------- + +You can build Tajo from the source code and then get JAR files as follows: + +.. code-block:: bash + + $ tar xzvf tajo-x.y.z-src.tar.gz + $ mvn clean package -DskipTests -Pdist -Dtar + $ ls -l tajo-dist/target/tajo-x.y.z/share/jdbc-dist/tajo-jdbc-x.y.z.jar + + +Setting the CLASSPATH +======================= + +In order to use the JDBC driver, you should add ``tajo-jdbc-x.y.z.jar`` in your ``CLASSPATH``. + +.. code-block:: bash + + CLASSPATH=path/to/tajo-jdbc-x.y.z.jar:$CLASSPATH + + +Connecting to the Tajo cluster instance +======================================= +A Tajo cluster is represented by a URL. Tajo JDBC driver can take the following URL forms: + + * ``jdbc:tajo://host/`` + * ``jdbc:tajo://host/database`` + * ``jdbc:tajo://host:port/`` + * ``jdbc:tajo://host:port/database`` + +Each part of URL has the following meanings: + + * ``host`` - The hostname of the TajoMaster. You can put hostname or ip address here. + * ``port`` - The port number that server is listening. Default port number is 26002. + * ``database`` - The database name. The default database name is ``default``. + + To connect, you need to get ``Connection`` instance from Java JDBC Driver Manager as follows: + +.. code-block:: java + + Connection db = DriverManager.getConnection(url); + + +Connection Parameters +===================== +Connection parameters lets the JDBC Copnnection to enable or disable additional features. You should use ``java.util.Properties`` to pass your connection parameters into ``Connection``. The following example means that the transmission of ResultSet uses compression and its connection timeout is 15 seconds. + +.. code-block:: java + + String url = "jdbc:tajo://localhost/test"; + Properties props = new Properties(); + props.setProperty("useCompression","true"); // use compression for ResultSet + props.setProperty("connectTimeout","15000"); // 15 seconds + Connection conn = DriverManager.getConnection(url, props); + +The connection parameters that Tajo currently supports are as follows: + + * ``useCompression = bool`` - Enable compressed transfer for ResultSet. + * ``defaultRowFetchSize = int`` - Determine the number of rows fetched in ResultSet by one fetch with trip to the Server. + * ``connectTimeout = int (seconds)`` - The timeout value used for socket connect operations. If connecting to the server takes longer than this value, the connection is broken. The timeout is specified in seconds and a value of zero means that it is disabled. + * ``socketTimeout = int (seconds)`` - The timeout value used for socket read operations. If reading from the server takes longer than this value, the connection is closed. This can be used as both a brute force global query timeout and a method of detecting network problems. The timeout is specified in seconds and a value of zero means that it is disabled. + * ``retry = int`` - Number of retry operation. Tajo JDBC driver is resilient against some network or connection problems. It determines how many times the connection will retry. + + +An Example JDBC Client +======================= + +The JDBC driver class name is ``org.apache.tajo.jdbc.TajoDriver``. +You can get the driver ``Class.forName("org.apache.tajo.jdbc.TajoDriver")``. +The connection url should be ``jdbc:tajo://<TajoMaster hostname>:<TajoMaster client rpc port>/<database name>``. +The default TajoMaster client rpc port is ``26002``. +If you want to change the listening port, please refer :doc:`/configuration/cluster_setup`. + +.. note:: + + Currently, Tajo does not support the concept of database and namespace. + All tables are contained in ``default`` database. So, you don't need to specify any database name. + +The following shows an example of JDBC Client. + +.. code-block:: java + + import java.sql.Connection; + import java.sql.ResultSet; + import java.sql.Statement; + import java.sql.DriverManager; + + public class TajoJDBCClient { + + .... + + public static void main(String[] args) throws Exception { + + try { + Class.forName("org.apache.tajo.jdbc.TajoDriver"); + } catch (ClassNotFoundException e) { + // fill your handling code + } + + Connection conn = DriverManager.getConnection("jdbc:tajo://127.0.0.1:26002/default"); + + Statement stmt = null; + ResultSet rs = null; + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery("select * from table1"); + while (rs.next()) { + System.out.println(rs.getString(1) + "," + rs.getString(3)); + } + } finally { + if (rs != null) rs.close(); + if (stmt != null) stmt.close(); + if (conn != null) conn.close(); + } + } + } + Added: tajo/site/docs/0.11.0/_sources/partitioning/column_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/partitioning/column_partitioning.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/partitioning/column_partitioning.txt (added) +++ tajo/site/docs/0.11.0/_sources/partitioning/column_partitioning.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,52 @@ +********************************* +Column Partitioning +********************************* + +The column table partition is designed to support the partition of Apache Hiveâ¢. + +================================================ +How to Create a Column Partitioned Table +================================================ + +You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use +the ``PARTITION BY COLUMN`` clause with partition keys. + +For example, assume there is a table ``orders`` composed of the following schema. :: + + id INT, + item_name TEXT, + price FLOAT + +Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. +Then, you should create a table as follows: + +.. code-block:: sql + + CREATE TABLE orders ( + id INT, + item_name TEXT, + price + ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT); + +================================================== +Partition Pruning on Column Partitioned Tables +================================================== + +The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing +during query planning phase. + +* ``=`` +* ``<>`` +* ``>`` +* ``<`` +* ``>=`` +* ``<=`` +* LIKE predicates with a leading wild-card character +* IN list predicates + +================================================== +Compatibility Issues with Apache Hive⢠+================================================== + +If partitioned tables of Hive are created as external tables in Tajo, Tajo can process the Hive partitioned tables directly. +There haven't known compatibility issues yet. \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/partitioning/hash_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/partitioning/hash_partitioning.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/partitioning/hash_partitioning.txt (added) +++ tajo/site/docs/0.11.0/_sources/partitioning/hash_partitioning.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,5 @@ +******************************** +Hash Partitioning +******************************** + +.. todo:: \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/partitioning/intro_to_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/partitioning/intro_to_partitioning.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/partitioning/intro_to_partitioning.txt (added) +++ tajo/site/docs/0.11.0/_sources/partitioning/intro_to_partitioning.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,15 @@ +************************************** +Introduction to Partitioning +************************************** + +Table partitioning provides two benefits: easy table management and data pruning by partition keys. +Currently, Apache Tajo only provides Apache Hive-compatible column partitioning. + +========================= +Partitioning Methods +========================= + +Tajo provides the following partitioning methods: + * Column Partitioning + * Range Partitioning (TODO) + * Hash Partitioning (TODO) \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/partitioning/range_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/partitioning/range_partitioning.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/partitioning/range_partitioning.txt (added) +++ tajo/site/docs/0.11.0/_sources/partitioning/range_partitioning.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,5 @@ +*************************** +Range Partitioning +*************************** + +.. todo:: \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/sql_language.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/sql_language.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/sql_language.txt (added) +++ tajo/site/docs/0.11.0/_sources/sql_language.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,15 @@ +************ +SQL Language +************ + +.. toctree:: + :maxdepth: 1 + + sql_language/data_model + sql_language/ddl + sql_language/insert + sql_language/alter_table + sql_language/queries + sql_language/joins + sql_language/sql_expression + sql_language/predicates \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/sql_language/alter_table.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/sql_language/alter_table.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/sql_language/alter_table.txt (added) +++ tajo/site/docs/0.11.0/_sources/sql_language/alter_table.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,100 @@ +************************ +ALTER TABLE +************************ + +======================== +RENAME TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + ALTER TABLE <table_name> RENAME TO <new_table_name> + + For example: + ALTER TABLE table1 RENAME TO table2; + +This statement lets you change the name of a table to a different name. + +======================== +RENAME COLUMN +======================== + +*Synopsis* + +.. code-block:: sql + + ALTER TABLE <table_name> RENAME COLUMN <column_name> TO <new_column_name> + + For example: + ALTER TABLE table1 RENAME COLUMN id TO id2; + +This statement will allow users to change a column's name. + +======================== +ADD COLUMN +======================== + +*Synopsis* + +.. code-block:: sql + + ALTER TABLE <table_name> ADD COLUMN <column_name> <data_type> + + For example: + ALTER TABLE table1 ADD COLUMN id text; + +This statement lets you add new columns to the end of the existing column. + +======================== +SET PROPERTY +======================== + +*Synopsis* + +.. code-block:: sql + + ALTER TABLE <table_name> SET PROPERTY (<key> = <value>, ...) + + For example: + ALTER TABLE table1 SET PROPERTY 'timezone' = 'GMT-7' + ALTER TABLE table1 SET PROPERTY 'text.delimiter' = '&' + ALTER TABLE table1 SET PROPERTY 'compression.type'='RECORD','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec' + + +This statement will allow users to change a table property. + +======================== + DROP PARTITION +======================== + +*Synopsis* + +.. code-block:: sql + + ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE] + + For example: + ALTER TABLE table1 DROP PARTITION (col1 = 1 , col2 = 2) + ALTER TABLE table1 DROP PARTITION (col1 = '2015' , col2 = '01', col3 = '11' ) + ALTER TABLE table1 DROP PARTITION (col1 = 'TAJO' ) PURGE + +You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for a table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exist. You can use ``IF EXISTS`` to skip the error. + +======================== +REPAIR PARTITION +======================== + +Tajo stores a list of partitions for each table in its catalog. If partitions are manually added to the distributed file system, the metastore is not aware of these partitions. Running the ``ALTER TABLE REPAIR PARTITION`` statement ensures that the tables are properly populated. + +*Synopsis* + +.. code-block:: sql + + ALTER TABLE <table_name> REPAIR PARTITION + +.. note:: + + Even though an information of a partition is stored in the catalog, Tajo does not recover it when its partition directory doesn't exist in the file system. + Added: tajo/site/docs/0.11.0/_sources/sql_language/data_model.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/sql_language/data_model.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/sql_language/data_model.txt (added) +++ tajo/site/docs/0.11.0/_sources/sql_language/data_model.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,68 @@ +********** +Data Model +********** + +=============== +Data Types +=============== + ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| Support | SQL Type Name | Alias | Size (byte) | Description | Range | ++===========+================+============================+=============+===================================================+==========================================================================+ +| O | boolean | bool | 1 | | true/false | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | bit | | 1 | | 1/0 | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | varbit | bit varying | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | tinyint | int1 | 1 | tiny-range integer value | -2^7 (-128) to 2^7-1 (127) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | smallint | int2 | 2 | small-range integer value | -2^15 (-32,768) to 2^15-1 (32,767) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | integer | int, int4 | 4 | integer value | -2^31 (-2,147,483,648) to 2^31 - 1 (2,147,483,647) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | bigint | bit varying | 8 | larger range integer value | -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | real | int8 | 4 | variable-precision, inexact, real number value | -3.4028235E+38 to 3.4028235E+38 (6 decimal digits precision) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | float[(n)] | float4 | 4 or 8 | variable-precision, inexact, real number value | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | double | float8, double precision | 8 | variable-precision, inexact, real number value | 1 .7Eâ308 to 1.7E+308 (15 decimal digits precision) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | number | decimal | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | char[(n)] | character | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | varchar[(n)] | character varying | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | text | text | | variable-length unicode text | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | binary | binary | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | varbinary[(n)] | binary varying | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | blob | bytea | | variable-length binary string | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | date | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | time | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | timetz | time with time zone | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | timestamp | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | timestamptz | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | inet4 | | 4 | IPv4 address | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ + +----------------------------------------- +Using real number value (real and double) +----------------------------------------- + +The real and double data types are mapped to float and double of java primitives respectively. Java primitives float and double follows the IEEE 754 specification. So, these types are correctly matched to SQL standard data types. + ++ float[( n )] is mapped to either float or double according to a given length n. If n is specified, it must be bewtween 1 and 53. The default value of n is 53. ++ If 1 <- n <- 24, a value is mapped to float (6 decimal digits precision). ++ If 25 <- n <- 53, a value is mapped to double (15 decimal digits precision). ++ Do not use approximate real number columns in WHERE clause in order to compare some exact matches, especially the - and <> operators. The > or < comparisons work well. \ No newline at end of file Added: tajo/site/docs/0.11.0/_sources/sql_language/ddl.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/sql_language/ddl.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/sql_language/ddl.txt (added) +++ tajo/site/docs/0.11.0/_sources/sql_language/ddl.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,127 @@ +************************ +Data Definition Language +************************ + +======================== +CREATE DATABASE +======================== + +*Synopsis* + +.. code-block:: sql + + CREATE DATABASE [IF NOT EXISTS] <database_name> + +*Description* + +Database is the namespace in Tajo. A database can contain multiple tables which have unique name in it. +``IF NOT EXISTS`` allows ``CREATE DATABASE`` statement to avoid an error which occurs when the database exists. + +======================== +DROP DATABASE +======================== + +*Synopsis* + +.. code-block:: sql + + DROP DATABASE [IF EXISTS] <database_name> + +``IF EXISTS`` allows ``DROP DATABASE`` statement to avoid an error which occurs when the database does not exist. + +======================== +CREATE TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + CREATE TABLE [IF NOT EXISTS] <table_name> [(column_list)] [TABLESPACE tablespace_name] + [using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>] + + CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name> (column_list) + using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>' + +*Description* + +In Tajo, there are two types of tables, `managed table` and `external table`. +Managed tables are placed on some predefined tablespaces. The ``TABLESPACE`` clause is to specify a tablespace for this table. For external tables, Tajo allows an arbitrary table location with the ``LOCATION`` clause. +For more information about tables and tablespace, please refer to :doc:`/table_management/table_overview` and :doc:`/table_management/tablespaces`. + +``column_list`` is a sequence of the column name and its type like ``<column_name> <data_type>, ...``. Additionally, the `asterisk (*)` is allowed for external tables when their data format is `JSON`. You can find more details at :doc:`/table_management/json`. + +``IF NOT EXISTS`` allows ``CREATE [EXTERNAL] TABLE`` statement to avoid an error which occurs when the table does not exist. + +------------------------ + Compression +------------------------ + +If you want to add an external table that contains compressed data, you should give 'compression.code' parameter to CREATE TABLE statement. + +.. code-block:: sql + + create EXTERNAL table lineitem ( + L_ORDERKEY bigint, + L_PARTKEY bigint, + ... + L_COMMENT text) + + USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec') + LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy'; + +`compression.codec` parameter can have one of the following compression codecs: + * org.apache.hadoop.io.compress.BZip2Codec + * org.apache.hadoop.io.compress.DeflateCodec + * org.apache.hadoop.io.compress.GzipCodec + * org.apache.hadoop.io.compress.SnappyCodec + +======================== + DROP TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + DROP TABLE [IF EXISTS] <table_name> [PURGE] + +*Description* + +``IF EXISTS`` allows ``DROP DATABASE`` statement to avoid an error which occurs when the database does not exist. ``DROP TABLE`` statement removes a table from Tajo catalog, but it does not remove the contents. If ``PURGE`` option is given, ``DROP TABLE`` statement will eliminate the entry in the catalog as well as the contents. + +======================== + CREATE INDEX +======================== + +*Synopsis* + +.. code-block:: sql + + CREATE INDEX [ name ] ON table_name [ USING method ] + ( { column_name | ( expression ) } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) + [ WHERE predicate ] + +*Description* + +Tajo supports index for fast data retrieval. Currently, index is supported for only plain ``TEXT`` formats stored on ``HDFS``. +For more information, please refer to :doc:`/index_overview`. + +------------------------ + Index method +------------------------ + +Currently, Tajo supports only one type of index. + +Index methods: + * TWO_LEVEL_BIN_TREE: This method is used by default in Tajo. For more information about its structure, please refer to :doc:`/index/types`. + +======================== + DROP INDEX +======================== + +*Synopsis* + +.. code-block:: sql + + DROP INDEX name Added: tajo/site/docs/0.11.0/_sources/sql_language/insert.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.0/_sources/sql_language/insert.txt?rev=1710773&view=auto ============================================================================== --- tajo/site/docs/0.11.0/_sources/sql_language/insert.txt (added) +++ tajo/site/docs/0.11.0/_sources/sql_language/insert.txt Tue Oct 27 11:04:33 2015 @@ -0,0 +1,26 @@ +************************* +INSERT (OVERWRITE) INTO +************************* + +INSERT OVERWRITE statement overwrites a table data of an existing table or a data in a given directory. Tajo's INSERT OVERWRITE statement follows ``INSERT INTO SELECT`` statement of SQL. The examples are as follows: + +.. code-block:: sql + + create table t1 (col1 int8, col2 int4, col3 float8); + + -- when a target table schema and output schema are equivalent to each other + INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem; + -- or + INSERT OVERWRITE INTO t1 SELECT * FROM lineitem; + + -- when the output schema are smaller than the target table schema + INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem; + + -- when you want to specify certain target columns + INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem; + +In addition, INSERT OVERWRITE statement overwrites table data as well as a specific directory. + +.. code-block:: sql + + INSERT OVERWRITE INTO LOCATION '/dir/subdir' SELECT l_orderkey, l_quantity FROM lineitem; \ No newline at end of file
