The query is being split into two map/reduce jobs. The first job consists of 16 map tasks (no reduce job). The relevant log output is given below:
2009-08-14 11:29:38,245 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 2009-08-14 11:29:38,246 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration is:true 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2009-08-14 11:29:38,358 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS initialized 2009-08-14 11:29:38,358 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 FS The second job consists of 16 map tasks & 3 reduce tasks. None of the map tasks contain any log output from FileSinkOperator. The reduce tasks contain the following relevant log output: 2009-08-14 11:38:13,553 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS 2009-08-14 11:38:13,553 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS 2009-08-14 11:38:13,604 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 2009-08-14 11:38:13,605 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration is:false 2009-08-14 11:38:43,128 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS initialized 2009-08-14 11:38:43,128 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 FS You can see, that compression is "on" for the first map/reduce job, but "off" for the second one. Did I forget to set any configuration parameter? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
