[ 
https://issues.apache.org/jira/browse/PIG-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722907#comment-13722907
 ] 

Rohini Palaniswamy commented on PIG-3395:
-----------------------------------------

Before the whole and/or tree will not be pushed down. The visit of lhs and rhs 
is still there, but I am not sure how the replace will behave because it does 
not have full context and something partial might get pushed. Can you just 
modify your testcase to include one of those conditions to test the behaviour 
if we have cast or null check? 
                
> Large filter expression makes Pig hang
> --------------------------------------
>
>                 Key: PIG-3395
>                 URL: https://issues.apache.org/jira/browse/PIG-3395
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.12
>
>         Attachments: PIG-3395.patch, thread_dump.txt
>
>
> Currently, partition filter push down is quite costly. For example, if you 
> have many nested or/and expressions, Pig hangs:
> {code}
> base = load '<partitioned table>' using MyStorage();
> filt = filter base by
> (dateint == 20130719 and batchid == 'merged_1' and hour IN (19,20,21,22,23))
> or
> (dateint == 20130720 and batchid == 'merged_1' and hour IN 
> (0,1,2,3,4,5,6,7,8))
> or
> (dateint == 20130720 and batchid == 'merged_2' and hour == 7)
> or
> (dateint == 20130720 and batchid == 'merged_1' and hour IN 
> (9,10,11,12,13,14,15,16,17,18,19,20,21,22,23))
> or
> (dateint == 20130721 and batchid == 'merged_1' and hour IN 
> (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23))
> or
> (dateint == 20130722 and batchid == 'merged_1' and hour IN 
> (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16));
> dump filt;
> {code}
> Note that IN operator is converted to nested OR's by Pig parser.
> Looking at the thread dump, I found it creates almost 60 stack frames and 
> makes JVM suffer. (I will attach full stack trace.)
> {code}
> <repeated ...>
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:504)
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:237)
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:504)
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:214)
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:504)
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:211)
> at 
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:108)
> {code}
> Although the filter expression can be simplified, it seems possible to make 
> PColFilterExtractor more efficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to