[ 
https://issues.apache.org/jira/browse/PIG-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855973#comment-15855973
 ] 

Adam Szita commented on PIG-4913:
---------------------------------

I see, [~rohini]. 
It looks like that the frequent recompilation is caused by getFunction method 
which always calls the init method. This is necessary if we are using multiple 
Python scripts: one instance of PythonInterpreter seems to bind to one script 
(from which we can retrieve the Python locals, etc..).

I've attached a new patch [^PIG-4913.2.patch] to address this.
My approach is to use a pool of these interpreters and keep a number of them in 
memory for future use so Pig doesn't have to recompile each time. If the pool 
is full we'll remove the oldest instance and do Python script compilation then. 
The pool size can be set by
{code}
static final int INTERPRETER_POOL_SIZE = 10;
{code}


> Reduce jython function initiation during compilation
> ----------------------------------------------------
>
>                 Key: PIG-4913
>                 URL: https://issues.apache.org/jira/browse/PIG-4913
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Adam Szita
>         Attachments: PIG-4913.2.patch, PIG-4913.patch
>
>
> While investigating PIG-4908, saw that ScriptEngine.getScriptAsStream was 
> invoked way too many times during compilation phase for a simple script.
> {code:title=sleep.py}
> #!/usr/bin/python
> import time;
> @outputSchema("sltime:int")
> def sleep(num):
>     if num == 1:
>         print "Sleeping for %d minutes" % num;
>         time.sleep(num * 60);
>     return num;
> {code}
> {code:title=sleep.pig}
> register 'sleep.py' using jython;
> A = LOAD '/tmp/sleepdata' as (f1:int);
> B = FOREACH A generate $0, sleep($0);
> STORE B into '/tmp/tezout';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to