DDL" by Jo hnSichi

Apache Wiki Tue, 29 Dec 2009 18:01:10 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/LanguageManual/DDL" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=35&rev2=36

--------------------------------------------------

    | TEXTFILE
    | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
  }}}
- CREATE TABLE creates a table with given name. An error is thrown if a table 
with the same name exists. You can use IF NOT EXISTS to skip the error.
+ CREATE TABLE creates a table with the given name. An error is thrown if a 
table with the same name exists. You can use IF NOT EXISTS to skip the error.
  
- The EXTERNAL keyword lets you create a table and provide a LOCATION so that 
Hive does not use default location for this table. This comes in handy if you 
already have data generated. When dropping an EXTERNAL table, data in the table 
is NOT deleted from the file system.
+ The EXTERNAL keyword lets you create a table and provide a LOCATION so that 
Hive does not use a default location for this table. This comes in handy if you 
already have data generated. When dropping an EXTERNAL table, data in the table 
is NOT deleted from the file system.
  
  The LIKE form of CREATE TABLE allows you to copy an existing table definition 
exactly (without copying its data).
  
- You can create tables with custom SerDe or using native SerDe. A native SerDe 
is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified. 
You can use DELIMITED clause to read delimited files. Use SERDE clause to 
create a table with custom SerDe. Refer to SerDe section of User Guide for more 
information on SerDe.
+ You can create tables with custom SerDe or using native SerDe. A native SerDe 
is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified. 
You can use the DELIMITED clause to read delimited files. Use the SERDE clause 
to create a table with custom SerDe. Refer to SerDe section of the User Guide 
for more information on SerDe.
  
- You must specify list of columns for tables with native SerDe. Refer to Types 
part of the User Guide for the allowable column types. List of columns for 
tables with custom SerDe may be specified but Hive will query the SerDe to 
determine the list of columns for this table.
+ You must specify list of a columns for tables with native SerDe. Refer to the 
Types part of the User Guide for the allowable column types. A list of columns 
for tables with custom SerDe may be specified but Hive will query the SerDe to 
determine the actual list of columns for this table.
  
  Use STORED AS TEXTFILE if the data needs to be stored as plain text files. 
Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more 
about CompressedStorage if you are planning to keep data compressed in your 
Hive tables.  Use INPUTFORMAT and OUTPUTFORMAT to specify the name of a 
corresponding Input``Format and Output``Format class as a string literal, e.g. 
'org.apache.hadoop.hive.contrib.fileformat.base64.Base``64``Text``Input``Format'.
  
- Partitioned tables can be created using PARTIONED BY clause. A table can have 
one or more partition columns and a separate data directory is created for each 
set of partition columns values. Further tables or partitions can be bucketed 
using CLUSTERD BY columns and data can be sorted with in that bucket by SORT BY 
columns. This can improve performance on certain kind of queries.
+ Partitioned tables can be created using the PARTITIONED BY clause. A table 
can have one or more partition columns and a separate data directory is created 
for each distinct value combination in the partition columns. Further, tables 
or partitions can be bucketed using CLUSTERED BY columns, and data can be 
sorted within that bucket via SORT BY columns. This can improve performance on 
certain kinds of queries.
  
  Table names and column names are case insensitive but SerDe and property 
names are case sensitive.  Table and column comments are string literals 
(single-quoted).
  
- Tables can also be created and populated by the results of a query in one 
CTAS (create-table-as-select) statement. The table created by CTAS is atomic, 
meaning that the table is not seen by other users until all the query results 
are populated. So other users will either see the table with the complete 
results of the query or will not see the table at all.
+ Tables can also be created and populated by the results of a query in one 
create-table-as-select (CTAS) statement. The table created by CTAS is atomic, 
meaning that the table is not seen by other users until all the query results 
are populated. So other users will either see the table with the complete 
results of the query or will not see the table at all.
  
- There are two parts in CTAS, the SELECT part can be any 
[[Hive/LanguageManual/Select|SELECT statement]] supported by HiveQL. The CREATE 
part of the CTAS takes the resulting schema from the SELECT part and create the 
target table with other table properties such as the SerDe and storage format. 
The only restrictions in CTAS is that the target table cannot be a partitioned 
table nor an external table.
+ There are two parts in CTAS, the SELECT part can be any 
[[Hive/LanguageManual/Select|SELECT statement]] supported by HiveQL. The CREATE 
part of the CTAS takes the resulting schema from the SELECT part and creates 
the target table with other table properties such as the SerDe and storage 
format. The only restrictions in CTAS is that the target table cannot be a 
partitioned table (nor can it be an external table).
  
  Examples:
  
- An example statement that would create a table would be like:
+ Here's an example statement to create a table:
  
  {{{
  CREATE TABLE page_view(viewTime INT, userid BIGINT,
@@ -86, +86 @@

   PARTITIONED BY(dt STRING, country STRING)
   STORED AS SEQUENCEFILE;
  }}}
- The statement above creates page_view table with viewTime, userid, page_url, 
referrer_url, and ip columns with a comment. The table is also partitioned and 
data is stored in sequence files. The data format in the files is assumed to be 
field-delimited by ctrl-A and row-delimited by newline.
+ The statement above creates the page_view table with viewTime, userid, 
page_url, referrer_url, and ip columns (including comments). The table is also 
partitioned and data is stored in sequence files. The data format in the files 
is assumed to be field-delimited by ctrl-A and row-delimited by newline.
  
  {{{
  CREATE TABLE page_view(viewTime INT, userid BIGINT,
@@ -98, +98 @@

     FIELDS TERMINATED BY '\001'
  STORED AS SEQUENCEFILE;
  }}}
- The above statement lets you to create same table as previous table.
+ The above statement lets you create the same table as the previous table.
  
  {{{
  CREATE TABLE page_view(viewTime INT, userid BIGINT,
@@ -113, +113 @@

     MAP KEYS TERMINATED BY '\003'
   STORED AS SEQUENCEFILE;
  }}}
- In the example above, the page_view table is bucketed(clustered by) userid 
and within each bucket the data is sorted in the increasing order of viewTime. 
Such an organization allows the user to do efficient sampling on the clustered 
column - in this case userid. The sorting property allows internal operators to 
take advantage of the better-known data structure while evaluating queries, 
also increasing efficiency. MAP KEYS and COLLECTION ITEMS keywords can be used 
if any of the columns are lists or maps.
+ In the example above, the page_view table is bucketed (clustered by) userid 
and within each bucket the data is sorted in increasing order of viewTime. Such 
an organization allows the user to do efficient sampling on the clustered 
column - in this case userid. The sorting property allows internal operators to 
take advantage of the better-known data structure while evaluating queries, 
also increasing efficiency. MAP KEYS and COLLECTION ITEMS keywords can be used 
if any of the columns are lists or maps.
  
- In all the examples until now the data is stored in 
<hive.metastore.warehouse.dir>/page_view. Specify value of the key 
hive.metastore.warehouse.dir in Hive config file hive-site.xml.
+ In all the examples until now the data is stored in 
<hive.metastore.warehouse.dir>/page_view. Specify a value for the key 
hive.metastore.warehouse.dir in Hive config file hive-site.xml.
  
  {{{
  CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
@@ -127, +127 @@

   STORED AS TEXTFILE
   LOCATION '<hdfs_location>';
  }}}
- You can use above statement to create page_view table which points to data 
location any hdfs location. But you have to make sure that the data is 
delimited as specified in the query above.
+ You can use the above statement to create a page_view table which points to 
any hdfs location for its storage. But you still have to make sure that the 
data is delimited as specified in the query above.
  
  {{{
  CREATE TABLE new_key_value_store
@@ -137, +137 @@

  FROM key_value_store
  SORT BY new_key, key_value_pair;
  }}}
- The above CTAS statement creates the target table new_key_value_store with 
the schema, (new_key DOUBLE, key_value_pair STRING), derived from the results 
of the SELECT statement. If the SELECT statement does not specify column 
aliases, the column names will be automatically assigned to _col0, _col1, and 
_col2 etc. In addition, the new target table is created using a specific SerDe 
and a storage format independent of the source tables in the SELECT statement.
+ The above CTAS statement creates the target table new_key_value_store with 
the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of 
the SELECT statement. If the SELECT statement does not specify column aliases, 
the column names will be automatically assigned to _col0, _col1, and _col2 etc. 
In addition, the new target table is created using a specific SerDe and a 
storage format independent of the source tables in the SELECT statement.
  
  {{{
  CREATE TABLE empty_key_value_store
@@ -147, +147 @@

  In contrast, the statement above creates a new empty_key_value_store table 
whose definition exactly matches the existing key_value_store in all 
particulars other than table name.  The new table contains no rows.
  
  ==== Inserting Data Into Bucketed Tables ====
- The CLUSTER BY and SORTED BY creation commands do not affect how data is 
inserted into a table -- only how it is read.  This means that users must 
actively insert data correctly by specifying the number of reducers to be equal 
to the number of buckets, and using CLUSTER BY and SORT BY commands in their 
query.
+ The CLUSTER BY and SORTED BY creation commands do not affect how data is 
inserted into a table -- only how it is read.  This means that users must be 
careful to insert data correctly by specifying the number of reducers to be 
equal to the number of buckets, and using CLUSTER BY and SORT BY commands in 
their query.
  
  There is also an [[Hive/LanguageManual/DDL/BucketedTables|example of creating 
and populating bucketed tables]].
  
@@ -203, +203 @@

  ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type 
[COMMENT col_comment] [FIRST|AFTER column_name]
  }}}
  
- This command will allow users to change a column's name, data type, comment, 
or position, or a arbitrary combination of them.
+ This command will allow users to change a column's name, data type, comment, 
or position, or an arbitrary combination of them.
  
  Example:
  create table test_change (a int, b int, c int);
@@ -231, +231 @@

  table_properties:
    : (property_key = property_value, property_key = property_value, ... )
  }}}
- You can use this statement add your own metadata to the tables. Currently 
last_modified_user, last_modified_time properties are automatically added and 
managed by Hive. Users can add their own properties to this list. You can do 
DESCRIBE EXTENDED TABLE to get this information.
+ You can use this statement to add your own metadata to the tables. Currently 
last_modified_user, last_modified_time properties are automatically added and 
managed by Hive. Users can add their own properties to this list. You can do 
DESCRIBE EXTENDED TABLE to get this information.
  
  === Add Serde Properties ===
  {{{
@@ -255, +255 @@

  NOTE: These commands will only modify Hive's metadata, and will NOT 
reorganize or reformat existing data. Users should make sure the actual data 
layout conforms with the metadata definition.
  
  == Show/Describe Statements ==
- These statements provide a way to query Hive metastore for existing data and 
metadata accessible to this Hive system.
+ These statements provide a way to query the Hive metastore for existing data 
and metadata accessible to this Hive system.
  
  === Show Tables ===
  {{{
@@ -281, +281 @@

  DESCRIBE [EXTENDED] table_name[DOT col_name]
  DESCRIBE [EXTENDED] table_name[DOT col_name ( [DOT field_name] | [DOT 
'$elem$'] | [DOT '$key$'] | [DOT '$value$'] )* ]
  }}}
- DESCRIBE TABLE shows the list of columns including partition column for the 
given table. If EXTENDED keyword is specified then it will show all the 
metadata for the table in Thrift serialized form. This is generally only useful 
for debugging and not for general use.
+ DESCRIBE TABLE shows the list of columns including partition columns for the 
given table. If the EXTENDED keyword is specified then it will show all the 
metadata for the table in Thrift serialized form. This is generally only useful 
for debugging and not for general use.
  
  If a table has complex column then you can examine the attributes of this 
column by specifying table_name.complex_col_name (and '$elem$' for array 
element, '$key$' for map key, and '$value$' for map value). You can specify 
this recursively to explore the complex column type.
  
@@ -295, +295 @@

  {{{
  CREATE TEMPORARY FUNCTION function_name AS class_name
  }}}
- This statement lets you create a function that is implemented by the 
class_name. You can use this function in Hive queries as long as the session 
lasts. You can use any class that is in the class path of Hive. You can add 
jars to class path by executing 'ADD FILES' statements. Please refer to CLI 
section in User Guide for more information on how to add/delete files from Hive 
classpath. Using this you can define User Defined Functions.
+ This statement lets you create a function that is implemented by the 
class_name. You can use this function in Hive queries as long as the session 
lasts. You can use any class that is in the class path of Hive. You can add 
jars to class path by executing 'ADD FILES' statements. Please refer to the CLI 
section in the User Guide for more information on how to add/delete files from 
the Hive classpath. Using this, you can install User Defined Functions.

[Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by Jo hnSichi

Reply via email to