The query is being split into two map/reduce jobs. The first job
consists of 16 map tasks (no reduce job). The relevant log output is
given below:

2009-08-14 11:29:38,245 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS 
hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
2009-08-14 11:29:38,246 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
configuration is:true
2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2009-08-14 11:29:38,358 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS
initialized
2009-08-14 11:29:38,358 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6
FS

The second job consists of 16 map tasks & 3 reduce tasks. None of the
map tasks contain any log output from FileSinkOperator. The reduce
tasks contain the following relevant log output:

2009-08-14 11:38:13,553 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3
FS
2009-08-14 11:38:13,553 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3
FS
2009-08-14 11:38:13,604 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS 
hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
2009-08-14 11:38:13,605 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
configuration is:false
2009-08-14 11:38:43,128 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS
initialized
2009-08-14 11:38:43,128 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3
FS

You can see, that compression is "on" for the  first map/reduce job,
but "off" for the second one. Did I forget to set any configuration
parameter?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to