DDL" by He Yongqiang

Apache Wiki Wed, 23 Dec 2009 14:12:19 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/LanguageManual/DDL" page has been changed by HeYongqiang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=23&rev2=24

--------------------------------------------------

  == Create/Drop Table and Partition Statements ==
  === Create Table ===
  {{{
- CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
+ CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
    [(col_name data_type [COMMENT col_comment], ...)]
    [table_comment]
    [PARTITIONED BY (col_name data_type [col_comment], col_name data_type 
[COMMENT col_comment], ...)]
@@ -30, +30 @@

    | STRING
  
  array_type
-   : ARRAY < primitive_type > 
+   : ARRAY < primitive_type >
  
  map_type
    : MAP < primitive_type, primitive_type >
  
  row_format
-   : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY 
char] 
+   : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY 
char]
          [MAP KEYS TERMINATED BY char]
    | SERDE serde_name [WITH SERDEPROPERTIES property_name=property_value, 
property_name=property_value, ...]
  
  file_format:
    : SEQUENCEFILE
    | TEXTFILE
- 
  }}}
- 
  CREATE TABLE creates a table with given name. An error is thrown if a table 
with the same name exists. You can use IF NOT EXISTS to skip the error.
  
  EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive 
does not use default location for this table. This comes in handy if you 
already have data generated. When dropping an EXTERNAL table, data in the table 
is NOT deleted from the file system.
  
  You can create tables with custom SerDe or using native SerDe. A native SerDe 
is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified. 
You can use DELIMITED clause to read delimited files. Use SERDE clause to 
create a table with custom SerDe. Refer to SerDe section of User Guide for more 
information on SerDe.
-  
+ 
  You must specify list of columns for tables with native SerDe. Refer to Types 
part of the User Guide for the allowable column types. List of columns for 
tables with custom SerDe may be specified but Hive will query the SerDe to 
determine the list of columns for this table.
  
  Use STORED AS TEXTFILE if the data needs to be stored as plain text files. 
Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more 
about CompressedStorage if you are planning to keep data compressed in your 
Hive tables.
@@ -62, +60 @@

  
  Tables can also be created and populated by the results of a query in one 
CTAS (create-table-as-select) statement. The table created by CTAS is atomic, 
meaning that the table is not seen by other users until all the query results 
are populated. So other users will either see the table with the complete 
results of the query or will not see the table at all.
  
- There are two parts in CTAS, the SELECT part can be any 
[[Hive/LanguageManual/Select|SELECT statement]] supported by HiveQL. The CREATE 
part of the CTAS takes the resulting schema from the SELECT part and create the 
target table with other table properties such as the SerDe and storage format. 
The only restrictions in CTAS is that the target table cannot be a partitioned 
table nor an external table. 
+ There are two parts in CTAS, the SELECT part can be any 
[[Hive/LanguageManual/Select|SELECT statement]] supported by HiveQL. The CREATE 
part of the CTAS takes the resulting schema from the SELECT part and create the 
target table with other table properties such as the SerDe and storage format. 
The only restrictions in CTAS is that the target table cannot be a partitioned 
table nor an external table.
  
  Examples:
  
- An example statement that would create a table would be like: 
+ An example statement that would create a table would be like:
  
  {{{
- CREATE TABLE page_view(viewTime INT, userid BIGINT, 
+ CREATE TABLE page_view(viewTime INT, userid BIGINT,
-      page_url STRING, referrer_url STRING, 
+      page_url STRING, referrer_url STRING,
-      ip STRING COMMENT 'IP Address of the User') 
+      ip STRING COMMENT 'IP Address of the User')
-  COMMENT 'This is the page view table' 
+  COMMENT 'This is the page view table'
-  PARTITIONED BY(dt STRING, country STRING) 
+  PARTITIONED BY(dt STRING, country STRING)
   STORED AS SEQUENCEFILE;
  }}}
- 
- The statement above creates page_view table with viewTime, userid, page_url, 
referrer_url, up columns with a comment. The table is also partitioned and data 
is stored in sequence files. The data in the files assumed to be field 
delimited by ctrl-A and row delimited by newline. 
+ The statement above creates page_view table with viewTime, userid, page_url, 
referrer_url, up columns with a comment. The table is also partitioned and data 
is stored in sequence files. The data in the files assumed to be field 
delimited by ctrl-A and row delimited by newline.
  
  {{{
- CREATE TABLE page_view(viewTime INT, userid BIGINT, 
+ CREATE TABLE page_view(viewTime INT, userid BIGINT,
-      page_url STRING, referrer_url STRING, 
+      page_url STRING, referrer_url STRING,
-      ip STRING COMMENT 'IP Address of the User') 
+      ip STRING COMMENT 'IP Address of the User')
-  COMMENT 'This is the page view table' 
+  COMMENT 'This is the page view table'
-  PARTITIONED BY(dt STRING, country STRING) 
+  PARTITIONED BY(dt STRING, country STRING)
-  ROW FORMAT DELIMITED 
+  ROW FORMAT DELIMITED
-    FIELDS TERMINATED BY '\001' 
+    FIELDS TERMINATED BY '\001'
  STORED AS SEQUENCEFILE;
  }}}
- 
  The above statement lets you to create same table as previous table.
  
  {{{
- CREATE TABLE page_view(viewTime INT, userid BIGINT, 
+ CREATE TABLE page_view(viewTime INT, userid BIGINT,
-      page_url STRING, referrer_url STRING, 
+      page_url STRING, referrer_url STRING,
-      ip STRING COMMENT 'IP Address of the User') 
+      ip STRING COMMENT 'IP Address of the User')
-  COMMENT 'This is the page view table' 
+  COMMENT 'This is the page view table'
-  PARTITIONED BY(dt STRING, country STRING) 
+  PARTITIONED BY(dt STRING, country STRING)
-  CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS 
+  CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
-  ROW FORMAT DELIMITED 
+  ROW FORMAT DELIMITED
-    FIELDS TERMINATED BY '\001' 
+    FIELDS TERMINATED BY '\001'
-    COLLECTION ITEMS TERMINATED BY '\002' 
+    COLLECTION ITEMS TERMINATED BY '\002'
-    MAP KEYS TERMINATED BY '\003' 
+    MAP KEYS TERMINATED BY '\003'
   STORED AS SEQUENCEFILE;
  }}}
- 
- In the example above, the page_view table is bucketed(clustered by) userid 
and within each bucket the data is sorted in the increasing order of viewTime. 
Such an organization allows the user to do efficient sampling on the clustered 
column - in this case userid. The sorting property allows internal operators to 
take advantage of the better-known data structure while evaluating queries, 
also increasing efficiency. MAP KEYS and COLLECTION ITEMS keywords can be used 
if any of the columns are lists or maps. 
+ In the example above, the page_view table is bucketed(clustered by) userid 
and within each bucket the data is sorted in the increasing order of viewTime. 
Such an organization allows the user to do efficient sampling on the clustered 
column - in this case userid. The sorting property allows internal operators to 
take advantage of the better-known data structure while evaluating queries, 
also increasing efficiency. MAP KEYS and COLLECTION ITEMS keywords can be used 
if any of the columns are lists or maps.
  
- In all the examples until now the data is stored in 
<hive.metastore.warehouse.dir>/page_view. Specify value of the key 
hive.metastore.warehouse.dir in Hive config file hive-site.xml. 
+ In all the examples until now the data is stored in 
<hive.metastore.warehouse.dir>/page_view. Specify value of the key 
hive.metastore.warehouse.dir in Hive config file hive-site.xml.
  
  {{{
- CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, 
+ CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
-      page_url STRING, referrer_url STRING, 
+      page_url STRING, referrer_url STRING,
-      ip STRING COMMENT 'IP Address of the User', 
+      ip STRING COMMENT 'IP Address of the User',
-      country STRING COMMENT 'country of origination') 
+      country STRING COMMENT 'country of origination')
-  COMMENT 'This is the staging page view table' 
+  COMMENT 'This is the staging page view table'
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
-  STORED AS TEXTFILE 
+  STORED AS TEXTFILE
   LOCATION '<hdfs_location>';
  }}}
- 
  You can use above statement to create page_view table which points to data 
location any hdfs location. But you have to make sure that the data is 
delimited as specified in the query above.
  
- 
  {{{
  CREATE TABLE new_key_value_store
-    ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" 
+    ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
-    STORED AS RCFile AS 
+    STORED AS RCFile AS
- SELECT (key % 1024) new_key, concat(key, value) key_value_pair 
+ SELECT (key % 1024) new_key, concat(key, value) key_value_pair
  FROM key_value_store
  SORT BY new_key, key_value_pair;
  }}}
- 
- The above CTAS statement creates the target table new_key_value_store with 
the schema, (new_key DOUBLE, key_value_pair STRING), derived from the results 
of the SELECT statement. If the SELECT statement does not specify column 
aliases, the column names will be automatically assigned to _col0, _col1, and 
_col2 etc. In addition, the new target table is created using a specific SerDe 
and a storage format independent of the source tables in the SELECT statement. 
+ The above CTAS statement creates the target table new_key_value_store with 
the schema, (new_key DOUBLE, key_value_pair STRING), derived from the results 
of the SELECT statement. If the SELECT statement does not specify column 
aliases, the column names will be automatically assigned to _col0, _col1, and 
_col2 etc. In addition, the new target table is created using a specific SerDe 
and a storage format independent of the source tables in the SELECT statement.
  
  ==== Inserting Data Into Bucketed Tables ====
  The CLUSTER BY and SORTED BY creation commands do not effect how data is 
inserted into a table -- only how it is read.  This means that users must 
actively insert data correctly by specifying the number of reducers to be equal 
to the number of buckets, and using CLUSTER BY and SORT BY commands in their 
query.
@@ -144, +136 @@

  {{{
  DROP TABLE table_name
  }}}
- 
  DROP TABLE removes metadata and data for this table. The data is actually 
moved to the .Trash/Current directory if Trash is configured. The metadata is 
completely lost.
  
  When dropping an EXTERNAL table, data in the table will NOT deleted from the 
file system.
@@ -156, +147 @@

  partition_spec:
    : PARTITION (partition_col = partition_col_value, partition_col = 
partiton_col_value, ...)
  }}}
- 
  You can use ALTER TABLE DROP PARTITION to drop a partition for a table. This 
removes the data and metadata for this partition. Partition value should be 
quoted only if it is a string.
  
  {{{
  alter table page_view drop partition(dt='2008-08-08', country='us')
- }}} 
+ }}}
+ 
  
  == Alter Table Statements ==
  Alter table statements enable you to change the structure of an existing 
table. You can add columns, change serde, add table and serde properties, or 
rename the table itself.
@@ -170, +161 @@

  {{{
  ALTER TABLE table_name RENAME TO new_table_name
  }}}
- 
- This statement lets you change the name of a table to a different name. 
'''The location of the data and the names of partitions do not change!''' In 
other words, the old table name is not "freed," and writing to it will alter 
the "renamed" table's data! 
+ This statement lets you change the name of a table to a different name. 
'''The location of the data and the names of partitions do not change!''' In 
other words, the old table name is not "freed," and writing to it will alter 
the "renamed" table's data!
+ 
+ === Change Column Name/Type/Position/Comment ===
+ {{{
+ ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type 
[COMMENT col_comment] [FIRST|AFTER column_name]
+ }}}
+ 
+ This command will allow users to change a column's name, data type, comment, 
or position, or a arbitrary combination of them.
+ 
+ Example:
+ create table test_change (a int, b int, c int);
+ 
+ "alter table change a a1 int;" will change column a's name to a1.
+ 
+ "alter table change a a1 string after b;" will change column a's name to a1, 
a's data type to string, and put it after column b. The new table's structure 
is: b int, a1 string, c int.
+ 
+ "alter table change b b1 int first;" will change column b's name to b1, and 
put it as the first column. The new table's structure is: b1 int, a string, c 
int.
+ 
+ NOTE: The column change command will only modify Hive's metadata, and will 
NOT touch data. Users should make sure the actual data layout conforms with the 
metadata definition.
  
  === Add/Replace Columns ===
  {{{
  ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT 
col_comment], ...)
  }}}
- 
  ADD COLUMNS lets you add new columns to the end of the existing columns but 
before the partition columns.
  
  REPLACE COLUMNS removes all existing columns and adds the new set of columns. 
This can be done only for native tables. Native tables are those tables that 
are created with DynamicSerDe or MetadataTypedColumnsetSerDe serdes. Refer to 
SerDe section of User Guide for more information.
@@ -189, +196 @@

  table_properties:
    : (property_key = property_value, property_key = property_value, ... )
  }}}
- 
  You can use this statement add your own metadata to the tables. Currently 
last_modified_user, last_modified_time properties are automatically added and 
managed by Hive. Users can add their own properties to this list. You can do 
DESCRIBE EXTENDED TABLE to get this information.
  
  === Add Serde Properties ===
@@ -200, +206 @@

  serde_properties:
    : (property_key = property_value, property_key = property_value, ... )
  }}}
- 
  This statement enables you to add user defined metadata to table SerDe 
object. The serde properties are passed to the table's SerDe when it is being 
initialized by Hive to serialize and deserialize data. So users can store any 
information required for their custom serde here. Refer to SerDe section of 
Users Guide for more information.
  
  == Show/Describe Statements ==
@@ -210, +215 @@

  {{{
  SHOW TABLES identifier_with_wildcards
  }}}
- 
  SHOW TABLES lists all the tables matching the given regular expression. 
Regular expression can contain only '*' for any character[s] or '|' for a 
choice. Examples are 'page_view', 'page_v*', '*view|page*', all which will 
match 'page_view' table. Matching tables are listed in alphabetical order. It 
is not an error if there are no matching tables found in metastore.
  
  === Show Partitions ===
  {{{
  SHOW PARTITIONS table_name
  }}}
- 
  SHOW PARTITIONS lists all the existing partitions for a given table. 
Partitions are listed in alphabetical order.
  
  === Describe Table/Column ===
@@ -225, +228 @@

  DESCRIBE [EXTENDED] table_name[DOT col_name]
  DESCRIBE [EXTENDED] table_name[DOT col_name ( [DOT field_name] | [DOT 
'$elem$'] | [DOT '$key$'] | [DOT '$value$'] )* ]
  }}}
- 
  DESCRIBE TABLE shows the list of columns including partition column for the 
given table. If EXTENDED keyword is specified then it will show all the 
metadata for the table in Thrift serialized form. This is generally only useful 
for debugging and not for general use.
  
  If a table has complex column then you can examine the attributes of this 
column by specifying table_name.complex_col_name (and '$elem$' for array 
element, '$key$' for map key, and '$value$' for map value). You can specify 
this recursively to explore the complex column type.
@@ -234, +236 @@

  {{{
  DESCRIBE [EXTENDED] table_name partition_spec
  }}}
- 
  This statement lists metadata for a given partition. The output is similar to 
that of DESCRIBE TABLE. Presently, the column information is not used while 
preparing plans.
  
  === Create Function ===
  {{{
- CREATE TEMPORARY FUNCTION function_name AS class_name 
+ CREATE TEMPORARY FUNCTION function_name AS class_name
  }}}
- 
  This statement lets you create a function that is implemented by the 
class_name. You can use this function in Hive queries as long as the session 
lasts. You can use any class that is in the class path of Hive. You can add 
jars to class path by executing 'ADD FILES' statements. Please refer to CLI 
section in User Guide for more information on how to add/delete files from Hive 
classpath. Using this you can define User Defined Functions.

[Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by He Yongqiang

Reply via email to