This is done already Use: add file <path>
This is same as �Cfile argument in hadoop streaming. U can refer to this file by it’s last component in ‘USING’ clause. list file will show list of current added files delete file <path> will delete from current session ________________________________ From: Zheng Shao [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 8:04 PM To: [email protected] Subject: Re: RE: did hive support the udf now? Hi Paradisehit, We are planning to add a way to allow users to attach files to the job. Before that, you will have to specify the full path of map.awk, and make sure it is accessible on all machines. Our cluster has a single home mount that allows users to do that. We never encounter the hanging problem. Mostly probably JobTracker is too busy to accept new jobs. Can you submit a normal map-reduce job to the same JobTracker? Zheng On Tue, Dec 2, 2008 at 7:12 PM, 施兴 <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: yes, it can. But when I write my script to extract the domain, it hangs all the time ,also there is no job page in the job monitor!But it occurs in the cli that: hive> FROM (FROM log_stg2 log SELECT TRANSFORM(url) USING 'awk -f map.awk' AS (domain) )tmap INSERT OVERWRITE TABLE test3 SELECT tmap.domain, COUNT(1) group by tmap.domain; Total MapReduce jobs = 2 Number of reducers = 1 In order to change numer of reducers use: set mapred.reduce.tasks = <number> Starting Job = job_200812011231_0257, Tracking URL = http://sz-mapred000.sz01:50030/jobdetails.jsp?jobid=job_200812011231_0257 Kill Command = ./../../bin/hadoop job -Dmapred.job.tracker=sz-mapred000.sz01:54311 -kill job_200812011231_0257 map = 0%, reduce =0% and it hanging.... without a running job shows in the jobtracker monitor: Running Jobs none ________________________________ Is the script will be distcp to all the tasktracker? and is the path of script right? I place the script under the hive directory. I followed the wiki: http://wiki.apache.org/hadoop/Hive/UserGuide 2.4. Running custom map/reduce jobs 2.4.1.<http://2.4.1.> MovieLens User Ratings Also it hangs when I just input the STATEMENT: hive> SELECT TRANSFORM(url) USING 'awk -f map.awk' AS (domain) FROM log_stg2; Total MapReduce jobs = 1 Starting Job = job_200812011231_0259, Tracking URL = http://sz-mapred000.sz01:50030/jobdetails.jsp?jobid=job_200812011231_0259 Kill Command = ./../../bin/hadoop job -Dmapred.job.tracker=sz-mapred000.sz01:54311 -kill job_200812011231_0259 map = 0%, reduce =0% 在2008-12-02 19:17:51,"Ashish Thusoo" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> 写道: >Paradisehi, > >you can perhaps use the regexp_replace udf to do this. > >Basically > >regexp_replace(a.url, '/*$', '') should be able replace everything after the >first / with an empty string. The second string which is a regular expression >is a java regular expression. > >Ashish >________________________________________ >From: paradisehi [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] >Sent: Tuesday, December 02, 2008 3:06 AM >To: [email protected]<mailto:[email protected]> >Subject: did hive support the udf now? > >My table:a just contains field:url >And Now I wanna compute each domain of url's pv? and out put insert into a >table:b domain pv. > >Now I didn't know whether the hive support the udf, maybe also I can use >map_script to support this. > -- Best wishes! My Friend~ -- Yours, Zheng
