[
https://issues.apache.org/jira/browse/PIG-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated PIG-4551:
------------------------------
Attachment: pig-4551_v03.patch
Added a testcase.
Made some major changes.
* Instead of blindly adding filter above LOSplit, it tries to verify that
loader can get advantage of this extra filter. (There is still false positives
in which case there would be an extra filter added with performance penalty but
should not affect correctness.
* Caches results from loadfunc.getPredicateFields/getPartitionKeys().
* Takes explicit split in consideration in addition to filters following the
splits.
* Previous patch had a bug that new filter was added only after all the pushing
of filters were done. This would not let the new filter gets pushed to the
loader. New patch repeats the pushup filter after we add the new filter above
split.
> Partition filter is not pushed down in case of SPLIT
> ----------------------------------------------------
>
> Key: PIG-4551
> URL: https://issues.apache.org/jira/browse/PIG-4551
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1
> Reporter: Rohini Palaniswamy
> Attachments: pig-4551_v01_notestyet.patch,
> pig-4551_v02_notestyet.patch, pig-4551_v03.patch
>
>
> The below query with implicit split will not push down the partition
> filters and will scan the whole table.
> {code}
> A = LOAD 'db1.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
> B = FILTER A BY ( ((date=='20150501' AND pk2 =='1')) and pk3 == '127' );
> C = FILTER A BY ( ((date=='20150501' AND pk2=='1') OR (date=='20150430' AND
> pk2=='1')) and pk3 == '127' );
> {code}
> The workaround now is to write two separate LOAD statements for each FILTER.
> We should do that behind the scenes while planning instead of user having to
> do that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)