[ 
https://issues.apache.org/jira/browse/PIG-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-4551:
------------------------------
    Attachment: pig-4551_v03.patch

Added a testcase.  
Made some major changes.
* Instead of blindly adding filter above LOSplit, it tries to verify that 
loader can get advantage of this extra filter.  (There is still false positives 
in which case there would be an extra filter added with performance penalty but 
should not affect correctness. 
* Caches results from loadfunc.getPredicateFields/getPartitionKeys().
* Takes explicit split in consideration in addition to filters following the 
splits.
* Previous patch had a bug that new filter was added only after all the pushing 
of filters were done.  This would not let the new filter gets pushed to the 
loader.  New patch repeats the pushup filter after we add the new filter above 
split.

> Partition filter is not pushed down in case of SPLIT
> ----------------------------------------------------
>
>                 Key: PIG-4551
>                 URL: https://issues.apache.org/jira/browse/PIG-4551
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1
>            Reporter: Rohini Palaniswamy
>         Attachments: pig-4551_v01_notestyet.patch, 
> pig-4551_v02_notestyet.patch, pig-4551_v03.patch
>
>
>   The below query with implicit split will not push down the partition 
> filters and will scan the whole table. 
> {code}
> A  = LOAD 'db1.table1'        USING org.apache.hive.hcatalog.pig.HCatLoader();
> B = FILTER A BY ( ((date=='20150501' AND pk2 =='1')) and pk3 == '127' );
> C  = FILTER A BY ( ((date=='20150501' AND pk2=='1') OR (date=='20150430' AND 
> pk2=='1')) and pk3 == '127' );
> {code}
> The workaround now is to write two separate LOAD statements for each FILTER. 
> We should do that behind the scenes while planning instead of user having to 
> do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to