[
https://issues.apache.org/jira/browse/PIG-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated PIG-4551:
------------------------------
Attachment: pig-4551_v02_notestyet.patch
Added extra conditions to
* Only insert merged filter when at least one of the loader contains partition
or predicate fields. (Although no checking on whether the merged filter
contains any of the fields since they could be renamed etc.)
* Making sure filters do not contain nonDeterministicUdf.
One worry with my approach is the overhead I may be adding with this extra
filter (and when it cannot be pushed down).
While I wait for feedback on my approach, I'll start adding test cases.
> Partition filter is not pushed down in case of SPLIT
> ----------------------------------------------------
>
> Key: PIG-4551
> URL: https://issues.apache.org/jira/browse/PIG-4551
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1
> Reporter: Rohini Palaniswamy
> Attachments: pig-4551_v01_notestyet.patch,
> pig-4551_v02_notestyet.patch
>
>
> The below query with implicit split will not push down the partition
> filters and will scan the whole table.
> {code}
> A = LOAD 'db1.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
> B = FILTER A BY ( ((date=='20150501' AND pk2 =='1')) and pk3 == '127' );
> C = FILTER A BY ( ((date=='20150501' AND pk2=='1') OR (date=='20150430' AND
> pk2=='1')) and pk3 == '127' );
> {code}
> The workaround now is to write two separate LOAD statements for each FILTER.
> We should do that behind the scenes while planning instead of user having to
> do that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)