yes, it can. But when I write my script to extract the domain, it hangs all the time ,also there is no job page in the job monitor!But it occurs in the cli that: *hive> FROM (FROM log_stg2 log SELECT TRANSFORM(url) USING 'awk -f map.awk' AS (domain) )tmap INSERT OVERWRITE TABLE test3 SELECT tmap.domain, COUNT(1) group by tmap.domain; Total MapReduce jobs = 2 Number of reducers = 1 In order to change numer of reducers use: set mapred.reduce.tasks = <number> Starting Job = job_200812011231_0257, Tracking URL = http://sz-mapred000.sz01:50030/jobdetails.jsp?jobid=job_200812011231_0257 Kill Command = ./../../bin/hadoop job -Dmapred.job.tracker=sz-mapred000.sz01:54311 -kill job_200812011231_0257 map = 0%, reduce =0% * and it hanging.... without a running job shows in the jobtracker monitor: Running Jobs *none* ------------------------------
Is the script will be distcp to all the tasktracker? and is the path of script right? I place the script under the hive directory. I followed the wiki: http://wiki.apache.org/hadoop/Hive/UserGuide 2.4. Running custom map/reduce jobs 2.4.1. MovieLens User Ratings** Also it hangs when I just input the STATEMENT: *hive> SELECT TRANSFORM(url) USING 'awk -f map.awk' AS (domain) FROM log_stg2; Total MapReduce jobs = 1 Starting Job = job_200812011231_0259, Tracking URL = http://sz-mapred000.sz01:50030/jobdetails.jsp?jobid=job_200812011231_0259 Kill Command = ./../../bin/hadoop job -Dmapred.job.tracker=sz-mapred000.sz01:54311 -kill job_200812011231_0259 map = 0%, reduce =0% * 在2008-12-02 19:17:51,"Ashish Thusoo" <[EMAIL PROTECTED]> 写道: >Paradisehi, > >you can perhaps use the regexp_replace udf to do this. > >Basically > >regexp_replace(a.url, '/*$', '') should be able replace everything after the >first / with an empty string. The second string which is a regular expression >is a java regular expression. > >Ashish >________________________________________ >From: paradisehi [EMAIL PROTECTED] >Sent: Tuesday, December 02, 2008 3:06 AM >To: [email protected] >Subject: did hive support the udf now? > >My table:a just contains field:url >And Now I wanna compute each domain of url's pv? and out put insert into a >table:b domain pv. > >Now I didn't know whether the hive support the udf, maybe also I can use >map_script to support this. > -- Best wishes! My Friend~
