DDL" by Ni ng Zhang

Apache Wiki Fri, 09 Oct 2009 16:17:52 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/LanguageManual/DDL" page has been changed by Ning Zhang:
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=16&rev2=17

    [ROW FORMAT row_format]
    [STORED AS file_format]
    [LOCATION hdfs_path]
+   [AS select_statement]
  
  data_type
    : primitive_type
@@ -57, +58 @@

  Partitioned tables can be created using PARTIONED BY clause. A table can have 
one or more partition columns and a separate data directory is created for each 
set of partition columns values. Further tables or partitions can be bucketed 
using CLUSTERD BY columns and data can be sorted with in that bucket by SORT BY 
columns. This can improve performance on certain kind of queries.
  
  Table names and column names are case insensitive but SerDe and property 
names are case sensitive.
+ 
+ Tables can also be created and populated by the results of a query in one 
CTAS (create-table-as-select) statement. There are two parts in CTAS, the 
select part can be any select statement supported by HiveQL. The create part of 
the CTAS takes the result schema (column names are aliases in the select clause 
and data types are derived from the select expressions) from the select part 
and create the target table with other properties (such as SerDe and storage 
format). The only restrictions in the CTAS is that the target table cannot be a 
partitioned table nor an external table. 
  
  Examples:
  
@@ -119, +122 @@

  
  You can use above statement to create page_view table which points to data 
location any hdfs location. But you have to make sure that the data is 
delimited as specified in the query above.
  
+ 
+ {{{
+ CREATE TABLE new_key_value_store
+    ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" 
+    STORED AS RCFile AS 
+ SELECT (key % 1024) new_key, concat(key, value) key_value_pair 
+ FROM src 
+ SORT BY new_key, key_value_pair;
+ }}}
+ 
+ The above CTAS statement create the target table new_key_value_store with the 
schema indicated by the results of the SELECT statement. In addition, the new 
target table is using a specific SerDe and storage format independent of the 
source tables in the SELECT statement. 
+ 
  ==== Inserting Data Into Bucketed Tables ====
  The CLUSTER BY and SORTED BY creation commands do not effect how data is 
inserted into a table -- only how it is read.  This means that users must 
actively insert data correctly by specifying the number of reducers to be equal 
to the number of buckets, and using CLUSTER BY and SORT BY commands in their 
query.

[Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by Ni ng Zhang

Reply via email to