Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by NeilConway: http://wiki.apache.org/hadoop/Hive/Tutorial ------------------------------------------------------------------------------ || BIGINT || count(1), count(DISTINCT col [, col]...)|| count(1) returns the number of members in the group, whereas the count(DISTINCT col) gets the count of distinct values of the columns in the group|| || DOUBLE || sum(col), sum(DISTINCT col) || returns the sum of the elements in the group or the sum of the distinct values of the column in the group|| || DOUBLE || avg(col), avg(DISTINCT col) || returns the average of the elements in the group or the average of the distinct values of the column in the group|| - || DOUBLE || min(col) || returns the minimum of the column in the group|| + || DOUBLE || min(col) || returns the minimum value of the column in the group|| - || DOUBLE || max(col) || returns the maximum value of the column n the group|| + || DOUBLE || max(col) || returns the maximum value of the column in the group|| == Language capabilities == Hive query language provides the basic SQL like operations. These operations work on tables or partitions. These operations are: @@ -223, +223 @@ LINES TERMINATED BY '12' STORED AS SEQUENCEFILE; }}} - In the example above, the table is bucketed(clustered by) userid and within each bucket the data is sorted in the increasing order of viewTime. Such an organization allows the user to do efficient sampling on the clustered column - in this case userid. The sorting property allows internal operators to take advantage of the better-known data structure while evaluating queries, also increasing efficiency. + In the example above, the table is bucketed (clustered by) userid and within each bucket the data is sorted in the increasing order of viewTime. Such an organization allows the user to do efficient sampling on the clustered column - in this case userid. The sorting property allows internal operators to take advantage of the better-known data structure while evaluating queries, also increasing efficiency. {{{ CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, @@ -341, +341 @@ (Note that page_views.date is used here because the table (above) was defined with PARTITIONED BY(date DATETIME, country STRING) ; if you name your partition something different, don't expect .date to do what you think!) == Joins == - In order to get a demographic breakdown(by gender) of page_view of 2008-03-03 one would need to join the page_view table and the user table on the userid column. This can be accomplished with a join as shown in the following query: + In order to get a demographic breakdown (by gender) of page_view of 2008-03-03 one would need to join the page_view table and the user table on the userid column. This can be accomplished with a join as shown in the following query: {{{ INSERT OVERWRITE TABLE pv_users SELECT pv.*, u.gender, u.age
