Hi, What do your Hive logs say? You can also check the Hadoop mapper and reduce job logs.
Thanks and Regards, Sonal On Wed, Feb 17, 2010 at 4:18 PM, prasenjit mukherjee <[email protected]>wrote: > > Here is my std-error : > hive> insert overwrite local directory '/tmp/mystuff' select transform(*) > using 'my.py' FROM myhivetable; > Total MapReduce jobs = 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201002160457_0033, Tracking URL = > http://ec2-204-236-205-98.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201002160457_0033 > Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker= > ec2-204-236-205-98.compute-1.amazonaws.com:8021 -kill > job_201002160457_0033 > 2010-02-17 05:40:28,380 map = 0%, reduce =0% > 2010-02-17 05:41:12,469 map = 100%, reduce =100% > Ended Job = job_201002160457_0033 with errors > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.ExecDriver > > > I am trying to use the following command : > > hive ql : > > add file /root/my.py > insert overwrite local directory '/tmp/mystuff' select transform(*) using > 'my.py' FROM myhivetable; > > and following is my my.py: > #!/usr/bin/python > import sys > for line in sys.stdin: > line = line.strip() > flds = line.split('\t') > (cl_id,cook_id)=flds[:2] > sub_id=cl_id > if cl_id.startswith('foo'): sub_id=cook_id; > print ','.join([sub_id,flds[2],flds[3]]) > > This works fine, as I tested it in commandline using : echo -e > 'aa\tbb\tcc\tdd' | /root/my.py > > Any pointers ? >
