[ https://issues.apache.org/jira/browse/HAWQ-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103334#comment-16103334 ]
Shubham Sharma commented on HAWQ-1198: -------------------------------------- In HiveDataFragmenter while building a BasicFilter all operators apart from "=" are [ignored( code reference) | https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java#L405-#L411 ]. This causes hive to fetch irrelevant fragments for every query that does not have an "equal to" operator. [Pull request 1272 | https://github.com/apache/incubator-hawq/pull/1272] addresses this issue. After the changes tested few queries and filtering out irrelevant fragments give a 10X performance increase while querying hive through pxf. Test queries {code} between_operator_test.sql set client_min_messages to debug; \timing select * from hawq_hive where datelocal between '2016-05-27' and '2016-05-29'; in_operator_test.sql set client_min_messages to debug; \timing select * from hawq_hive a where a.datelocal in ('2016-05-29','2016-05-28','2016-05-27'); logical_operator_test.sql set client_min_messages to debug; \timing select * from hawq_hive where datelocal <= '2016-05-29' and datelocal >='2016-05-27'; union_operator_test.sql set client_min_messages to debug; \timing select count(*) from (select * from hawq_hive a where a.datelocal = '2016-05-29' union all select * from hawq_hive b where b.datelocal='2016-05-28' union all select * from hawq_hive b where b.datelocal='2016-05-27' ) fs; {code} Before code changes {code} psql -f between_operator_test.sql &> before/between_operator_test.out psql -f union_operator_test.sql &> before/union_operator_test.out psql -f logical_operator_test.sql &> before/logical_operator_test.out psql -f in_operator_test.sql &> before/in_operator_test.out [gpadmin@localhost hive_fragmenter]$ grep -i "Fragment list" before/* Number of fragments filtered 136, should be three before/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) [gpadmin@localhost hive_fragmenter]$ grep Time before/* before/between_operator_test.out:Time: 4485.182 ms before/in_operator_test.out:Time: 2285.578 ms before/logical_operator_test.out:Time: 2508.315 ms before/union_operator_test.out:Time: 609.298 ms {code} After code changes {code} [gpadmin@localhost hive_fragmenter]$ psql -f between_operator_test.sql &> after/between_operator_test.out [gpadmin@localhost hive_fragmenter]$ psql -f union_operator_test.sql &> after/union_operator_test.out [gpadmin@localhost hive_fragmenter]$ psql -f logical_operator_test.sql &> after/logical_operator_test.out [gpadmin@localhost hive_fragmenter]$ psql -f in_operator_test.sql &> after/in_operator_test.out Number of fragments filtered reduced to three except for IN operator for which filtering is not implemented yet. [gpadmin@localhost hive_fragmenter]$ grep -i "Fragment list" after/* after/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (3 elements, pxf_isilon = false) after/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (3 elements, pxf_isilon = false) after/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) after/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) after/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (3 elements, pxf_isilon = false) after/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (3 elements, pxf_isilon = false) after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) [gpadmin@localhost hive_fragmenter]$ grep Time after/* after/between_operator_test.out:Time: 313.572 ms after/in_operator_test.out:Time: 2554.899 ms after/logical_operator_test.out:Time: 264.673 ms after/union_operator_test.out:Time: 541.272 ms {code} > Fragmenter should return only relevant fragments for partitioned tables when > X-GP-FILTER passed > ----------------------------------------------------------------------------------------------- > > Key: HAWQ-1198 > URL: https://issues.apache.org/jira/browse/HAWQ-1198 > Project: Apache HAWQ > Issue Type: Improvement > Components: PXF > Reporter: Oleksandr Diachenko > Assignee: Oleksandr Diachenko > > Currently, PXF Fragmenter api returns all fragments even if X-GP-FILTER is > provided. In a case of a partitioned table it's possible to evaluate > X-GP-FILTER and exclude irrelevant partitions if possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)