Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by DavidPhillips: http://wiki.apache.org/hadoop/Hive/UserGuide The comment on the change is: added examples from Jeff's 20081030linkedin presentation ------------------------------------------------------------------------------ == Supported Features == == Usage Examples == === Creating tables === + + ==== MovieLens User Ratings ==== + {{{ + CREATE TABLE u_data ( + userid INT, + movieid INT, + rating INT, + unixtime TIMESTAMP) + ROW FORMAT DELIMITED + FIELDS TERMINATED BY '\t'; + }}} ==== Apache Access Log Tables ==== {{{ @@ -19, +30 @@ 'serialization.null.format'='-') STORED AS TEXTFILE; }}} + ==== Control Separated Tables ==== {{{ CREATE TABLE mylog ( @@ -32, +44 @@ }}} === Loading tables === + + ==== MovieLens User Ratings ==== + Download and extract the data: + {{{ + wget http://www.grouplens.org/system/files/ml-data.tar__0.gz + tar xvzf ml-data.tar__0.gz + }}} + + Load it in: + {{{ + LOAD DATA LOCAL INPATH 'ml-data/u.data' + OVERWRITE INTO TABLE u_data; + }}} + === Running queries === + + ==== MovieLens User Ratings ==== + {{{ + SELECT COUNT(1) FROM u_data; + }}} + === Running custom map/reduce jobs === + + ==== MovieLens User Ratings ==== + Create weekday_mapper.py: + {{{ + import sys + import datetime + + for line in sys.stdin: + line = line.strip() + userid, movieid, rating, unixtime = line.split('\t') + weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday() + print ','.join([userid, movieid, rating, str(weekday)]) + }}} + + Use the mapper script: + {{{ + CREATE TABLE u_data_new ( + userid INT, + movieid INT, + rating INT, + weekday INT) + ROW FORMAT DELIMITED + FIELDS TERMINATED BY '\t'; + + INSERT OVERWRITE TABLE u_data_new + SELECT + TRANSFORM (userid, movieid, rating, unixtime) + USING 'python weekday_mapper.py' + AS (userid, movieid, rating, weekday) + FROM u_data; + + SELECT weekday, COUNT(1) + FROM u_data_new + GROUP BY weekday; + }}} + + '''Note: due to a bug in the parser, you must run the "INSERT OVERWRITE" query on a single line''' + === Using sampling === == Known Issues/Bugs ==
