Hi, I see that if a query has where clause, the FilterOperator is applied twice. Can you tell me why is it done so? It seems second operator is always filtering zero rows.
Explain on a query with where clause : hive> explain select * from input1 where input1.key != 10; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= (. (TOK_TABLE_OR_COL input1) key) 10)))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: input1 TableScan alias: input1 Filter Operator predicate: expr: (key <> 10) type: boolean Filter Operator predicate: expr: (key <> 10) type: boolean Select Operator expressions: expr: key type: int expr: value type: int outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 I see the same from the Mapper logs also. The first FilterOperator does the filtering and second operator always filters zero rows. 2010-08-12 14:33:22,149 INFO ExecMapper: <MAP>Id =5 <Children> <TS>Id =0 <Children> <FIL>Id =1 <Children> <FIL>Id =2 <Children> <SEL>Id =3 <Children> <FS>Id =4 <Parent>Id = 3 null<\Parent> <\FS> <\Children> <Parent>Id = 2 null<\Parent> <\SEL> <\Children> <Parent>Id = 1 null<\Parent> <\FIL> <\Children> <Parent>Id = 0 null<\Parent> <\FIL> <\Children> <Parent>Id = 5 null<\Parent> <\TS> <\Children> <\MAP> 2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarding 1 rows 2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows 2010-08-12 14:33:22,450 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 4417072 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 finished. closing... 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarded 1 rows 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1 rows 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing... 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 0 rows 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:1 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing... 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 0 rows 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 0 rows 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded 0 rows 2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/000000_0 2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/_tmp.000000_0 2010-08-12 14:33:22,454 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/000000_0 2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done 2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done 2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done 2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done 2010-08-12 14:33:22,485 INFO ExecMapper: ExecMapper: processed 1 rows: used memory = 5135888 Thanks Amareshwari