Here is my std-error :
hive> insert overwrite local directory '/tmp/mystuff' select transform(*)
using 'my.py' FROM myhivetable;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002160457_0033, Tracking URL =
http://ec2-204-236-205-98.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201002160457_0033
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=
ec2-204-236-205-98.compute-1.amazonaws.com:8021 -kill job_201002160457_0033
2010-02-17 05:40:28,380 map = 0%, reduce =0%
2010-02-17 05:41:12,469 map = 100%, reduce =100%
Ended Job = job_201002160457_0033 with errors
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
I am trying to use the following command :
hive ql :
add file /root/my.py
insert overwrite local directory '/tmp/mystuff' select transform(*) using
'my.py' FROM myhivetable;
and following is my my.py:
#!/usr/bin/python
import sys
for line in sys.stdin:
line = line.strip()
flds = line.split('\t')
(cl_id,cook_id)=flds[:2]
sub_id=cl_id
if cl_id.startswith('foo'): sub_id=cook_id;
print ','.join([sub_id,flds[2],flds[3]])
This works fine, as I tested it in commandline using : echo -e
'aa\tbb\tcc\tdd' | /root/my.py
Any pointers ?