RE: RE: did hive support the udf now?

Joydeep Sen Sarma Tue, 02 Dec 2008 20:07:44 -0800

This is done already

Use: add file <path>


This is same as �Cfile argument in hadoop streaming. U can refer to this file 
by it’s last component in ‘USING’ clause.

list file

will show list of current added files

delete file <path>

will delete from current session

________________________________
From: Zheng Shao [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 02, 2008 8:04 PM
To: [email protected]
Subject: Re: RE: did hive support the udf now?

Hi Paradisehit,

We are planning to add a way to allow users to attach files to the job. Before 
that, you will have to specify the full path of map.awk, and make sure it is 
accessible on all machines. Our cluster has a single home mount that allows 
users to do that.

We never encounter the hanging problem. Mostly probably JobTracker is too busy 
to accept new jobs.
Can you submit a normal map-reduce job to the same JobTracker?

Zheng
On Tue, Dec 2, 2008 at 7:12 PM, 施兴 <[EMAIL PROTECTED]<mailto:[EMAIL 
PROTECTED]>> wrote:
yes, it can.

But when I write my script to extract the domain, it hangs all the time ,also 
there is  no job page in the job monitor!But it occurs in the cli that:
hive> FROM (FROM log_stg2 log SELECT TRANSFORM(url) USING 'awk -f map.awk' AS 
(domain) )tmap INSERT OVERWRITE TABLE test3 SELECT tmap.domain, COUNT(1) group 
by tmap.domain;
Total MapReduce jobs = 2
Number of reducers = 1
In order to change numer of reducers use:
  set mapred.reduce.tasks = <number>
Starting Job = job_200812011231_0257, Tracking URL = 
http://sz-mapred000.sz01:50030/jobdetails.jsp?jobid=job_200812011231_0257
Kill Command = ./../../bin/hadoop job  
-Dmapred.job.tracker=sz-mapred000.sz01:54311 -kill job_200812011231_0257
 map = 0%,  reduce =0%
 and it hanging....
without a running job shows in the jobtracker monitor:
Running Jobs
none

________________________________

Is the script will be distcp to all the tasktracker? and is the path of script 
right? I place the script under the hive directory.
I followed the wiki: http://wiki.apache.org/hadoop/Hive/UserGuide
2.4. Running custom map/reduce jobs
2.4.1.<http://2.4.1.> MovieLens User Ratings
Also it hangs when I just input the STATEMENT:
hive> SELECT TRANSFORM(url) USING 'awk -f map.awk' AS (domain)  FROM log_stg2;
Total MapReduce jobs = 1
Starting Job = job_200812011231_0259, Tracking URL = 
http://sz-mapred000.sz01:50030/jobdetails.jsp?jobid=job_200812011231_0259
Kill Command = ./../../bin/hadoop job  
-Dmapred.job.tracker=sz-mapred000.sz01:54311 -kill job_200812011231_0259
 map = 0%,  reduce =0%

在2008-12-02 19:17:51，"Ashish Thusoo" <[EMAIL PROTECTED]<mailto:[EMAIL 
PROTECTED]>> 写道：








>Paradisehi,

>

>you can perhaps use the regexp_replace udf to do this.

>

>Basically

>

>regexp_replace(a.url, '/*$', '') should be able replace everything after the 
>first / with an empty string. The second string which is a regular expression 
>is a java regular expression.







>

>Ashish

>________________________________________

>From: paradisehi [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]

>Sent: Tuesday, December 02, 2008 3:06 AM







>To: [email protected]<mailto:[email protected]>

>Subject: did hive support the udf now?

>

>My table:a just contains field:url

>And Now I wanna compute each domain of url's pv? and out put insert into a 
>table:b domain pv.







>

>Now I didn't know whether the hive support the udf, maybe also I can use 
>map_script to support this.

>


--
Best wishes!
My Friend~



--
Yours,
Zheng

RE: RE: did hive support the udf now?

Reply via email to