[ 
https://issues.apache.org/jira/browse/HAWQ-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103334#comment-16103334
 ] 

Shubham Sharma commented on HAWQ-1198:
--------------------------------------

In HiveDataFragmenter while building a BasicFilter all operators apart from "=" 
are [ignored( code reference) | 
https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java#L405-#L411
 ]. This causes hive to fetch irrelevant fragments for every query that does 
not have an "equal to" operator.


[Pull request 1272 | https://github.com/apache/incubator-hawq/pull/1272] 
addresses this issue. After the changes tested few queries and filtering out 
irrelevant fragments give a 10X performance increase while querying hive 
through pxf.


Test queries

{code}
between_operator_test.sql
set client_min_messages to debug;
\timing
select * from hawq_hive where datelocal between '2016-05-27' and '2016-05-29';

in_operator_test.sql
set client_min_messages to debug;
\timing
select * from hawq_hive a where a.datelocal in 
('2016-05-29','2016-05-28','2016-05-27');

logical_operator_test.sql
set client_min_messages to debug;
\timing
select * from hawq_hive where datelocal <= '2016-05-29' and datelocal 
>='2016-05-27';

union_operator_test.sql
set client_min_messages to debug;
\timing
select count(*) from (select * from hawq_hive a where a.datelocal = 
'2016-05-29'  union all select * from hawq_hive b where 
b.datelocal='2016-05-28' union all select * from hawq_hive b where 
b.datelocal='2016-05-27' ) fs;

{code} 

Before code changes

{code}

psql -f between_operator_test.sql &> before/between_operator_test.out
psql -f union_operator_test.sql &> before/union_operator_test.out
psql -f logical_operator_test.sql &> before/logical_operator_test.out
psql -f in_operator_test.sql &> before/in_operator_test.out


[gpadmin@localhost hive_fragmenter]$ grep -i "Fragment list" before/*

Number of fragments filtered 136, should be three

before/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2:  
Fragment list: (136 elements, pxf_isilon = false)
before/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2:  
Fragment list: (136 elements, pxf_isilon = false)

before/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2:  Fragment 
list: (136 elements, pxf_isilon = false)
before/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2:  Fragment 
list: (136 elements, pxf_isilon = false)

before/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2:  
Fragment list: (136 elements, pxf_isilon = false)
before/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2:  
Fragment list: (136 elements, pxf_isilon = false)

before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  
Fragment list: (1 elements, pxf_isilon = false)
before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  
Fragment list: (1 elements, pxf_isilon = false)
before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  
Fragment list: (1 elements, pxf_isilon = false)
before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  
Fragment list: (1 elements, pxf_isilon = false)
before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  
Fragment list: (1 elements, pxf_isilon = false)
before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  
Fragment list: (1 elements, pxf_isilon = false)

[gpadmin@localhost hive_fragmenter]$ grep Time before/*

before/between_operator_test.out:Time: 4485.182 ms
before/in_operator_test.out:Time: 2285.578 ms
before/logical_operator_test.out:Time: 2508.315 ms
before/union_operator_test.out:Time: 609.298 ms

{code}


After code changes

{code}

[gpadmin@localhost hive_fragmenter]$ psql -f between_operator_test.sql &> 
after/between_operator_test.out
[gpadmin@localhost hive_fragmenter]$ psql -f union_operator_test.sql &> 
after/union_operator_test.out
[gpadmin@localhost hive_fragmenter]$ psql -f logical_operator_test.sql &> 
after/logical_operator_test.out
[gpadmin@localhost hive_fragmenter]$ psql -f in_operator_test.sql &> 
after/in_operator_test.out

Number of fragments filtered reduced to three except for IN operator for which 
filtering is not implemented yet.

[gpadmin@localhost hive_fragmenter]$ grep -i "Fragment list" after/*
after/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2:  
Fragment list: (3 elements, pxf_isilon = false)
after/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2:  
Fragment list: (3 elements, pxf_isilon = false)

after/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2:  Fragment list: 
(136 elements, pxf_isilon = false)
after/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2:  Fragment list: 
(136 elements, pxf_isilon = false)

after/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2:  
Fragment list: (3 elements, pxf_isilon = false)
after/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2:  
Fragment list: (3 elements, pxf_isilon = false)

after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  Fragment 
list: (1 elements, pxf_isilon = false)
after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  Fragment 
list: (1 elements, pxf_isilon = false)
after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  Fragment 
list: (1 elements, pxf_isilon = false)
after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  Fragment 
list: (1 elements, pxf_isilon = false)
after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  Fragment 
list: (1 elements, pxf_isilon = false)
after/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2:  Fragment 
list: (1 elements, pxf_isilon = false)

[gpadmin@localhost hive_fragmenter]$ grep Time after/*

after/between_operator_test.out:Time: 313.572 ms
after/in_operator_test.out:Time: 2554.899 ms
after/logical_operator_test.out:Time: 264.673 ms
after/union_operator_test.out:Time: 541.272 ms

{code}


> Fragmenter should return only relevant fragments for partitioned tables when 
> X-GP-FILTER passed
> -----------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1198
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1198
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Oleksandr Diachenko
>
> Currently, PXF Fragmenter api returns all fragments even if X-GP-FILTER is 
> provided. In a case of a partitioned table it's possible to evaluate 
> X-GP-FILTER and exclude irrelevant partitions if possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to