Here is my std-error :
hive> insert overwrite local directory '/tmp/mystuff' select transform(*)
using  'my.py' FROM myhivetable;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002160457_0033, Tracking URL =
http://ec2-204-236-205-98.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201002160457_0033
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=
ec2-204-236-205-98.compute-1.amazonaws.com:8021 -kill job_201002160457_0033
2010-02-17 05:40:28,380 map = 0%,  reduce =0%
2010-02-17 05:41:12,469 map = 100%,  reduce =100%
Ended Job = job_201002160457_0033 with errors
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver


I am trying to use the following command :

hive ql :

add file /root/my.py
insert overwrite local directory '/tmp/mystuff' select transform(*) using
'my.py' FROM myhivetable;

and following is my my.py:
#!/usr/bin/python
import sys
for line in sys.stdin:
  line = line.strip()
  flds = line.split('\t')
  (cl_id,cook_id)=flds[:2]
  sub_id=cl_id
  if cl_id.startswith('foo'): sub_id=cook_id;
  print ','.join([sub_id,flds[2],flds[3]])

This works fine, as I tested it in commandline using :  echo -e
'aa\tbb\tcc\tdd' |  /root/my.py

Any pointers ?

Reply via email to