Thanks folks for your help. I tried to use hive to analyze the apachelog. It is fine if I just select * from apachelog and I can get the results. But if I do anything like count, group by,.., It just shows " map = 0%, reduce = 0%" message again and again endlessly. I had to stop it. Any ideas? Thank you!
CREATE TABLE apachelog ( ipaddress STRING, identd STRING, user STRING, finishtime STRING, requestline string, returncode INT, size INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe' WITH SERDEPROPERTIES ( 'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol', 'quote.delim'='("|\\[|\\])', 'field.delim'=' ', 'serialization.null.format'='-') STORED AS TEXTFILE; hive> select count(*) from apachelog; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Execution log at: /tmp/mwang/mwang_20120420141818_2208abfb-1840-49f4-b0b4-0d44d90fc121.log Job running in-process (local Hadoop) 2012-04-20 14:19:06,691 null map = 0%, reduce = 0% 2012-04-20 14:20:06,882 null map = 0%, reduce = 0% 2012-04-20 14:21:07,073 null map = 0%, reduce = 0% 2012-04-20 14:22:07,258 null map = 0%, reduce = 0% 2012-04-20 14:23:07,418 null map = 0%, reduce = 0% 2012-04-20 14:24:07,579 null map = 0%, reduce = 0% 2012-04-20 14:25:07,738 null map = 0%, reduce = 0% 2012-04-20 14:26:07,903 null map = 0%, reduce = 0% 2012-04-20 14:27:08,075 null map = 0%, reduce = 0% 2012-04-20 14:28:08,241 null map = 0%, reduce = 0% 2012-04-20 14:29:08,397 null map = 0%, reduce = 0% 2012-04-20 14:30:08,552 null map = 0%, reduce = 0% 2012-04-20 14:31:08,712 null map = 0%, reduce = 0% 2012-04-20 14:32:08,893 null map = 0%, reduce = 0% 2012-04-20 14:33:09,056 null map = 0%, reduce = 0% 2012-04-20 14:34:09,223 null map = 0%, reduce = 0% 2012-04-20 14:35:09,396 null map = 0%, reduce = 0% -----Original Message----- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Monday, April 16, 2012 4:55 PM To: common-user@hadoop.apache.org Subject: Re: Hive Thrift help You can NOT connect to hive thrift to confirm it's status. Thrift is thrift not http. But you are right to say HiveServer does not produce and output by default. if netstat -nl | grep 10000 shows status it is up. On Mon, Apr 16, 2012 at 5:18 PM, Rahul Jain <rja...@gmail.com> wrote: > I am assuming you read thru: > > https://cwiki.apache.org/Hive/hiveserver.html > > The server comes up on port 10,000 by default, did you verify that it is > actually listening on the port ? You can also connect to hive server using > web browser to confirm its status. > > -Rahul > > On Mon, Apr 16, 2012 at 1:53 PM, Michael Wang > <michael.w...@meredith.com>wrote: > >> we need to connect to HIVE from Microstrategy reports, and it requires the >> Hive Thrift server. But I >> tried to start it, and it just hangs as below. >> # hive --service hiveserver >> Starting Hive Thrift Server >> Any ideas? >> Thanks, >> Michael >> >> This electronic message, including any attachments, may contain >> proprietary, confidential or privileged information for the sole use of the >> intended recipient(s). You are hereby notified that any unauthorized >> disclosure, copying, distribution, or use of this message is prohibited. If >> you have received this message in error, please immediately notify the >> sender by reply e-mail and delete it. >> This electronic message, including any attachments, may contain proprietary, confidential or privileged information for the sole use of the intended recipient(s). You are hereby notified that any unauthorized disclosure, copying, distribution, or use of this message is prohibited. If you have received this message in error, please immediately notify the sender by reply e-mail and delete it.