[ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2433:
------------------------------------

    Fix Version/s: 0.12
         Assignee: Rohini Palaniswamy
           Status: Patch Available  (was: Open)
    
> Jython import module not working if module path is in classpath
> ---------------------------------------------------------------
>
>                 Key: PIG-2433
>                 URL: https://issues.apache.org/jira/browse/PIG-2433
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.12
>
>         Attachments: PIG-2433.patch
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job 
> die with the message could not instantiate 
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in 
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
> script itself. Thus we cannot locate the script and skip the script in 
> job.xml. 
> For example:
> {code}
> register 'scriptB.py' using 
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
>  return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
>  return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
>  return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the 
> dependent modules scriptB relies on, in this case, scriptA. However, if 
> current directory is in classpath, instead of scriptA.py, we get 
> __pyclasspath__/scriptA.class. Then we try to put 
> __pyclasspath__/script$py.class into job.jar, Pig complains 
> __pyclasspath__/script$py.class does not exist. 
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
> 20.x, the test still success because MiniCluster will take local classpath so 
> it can still find scriptA.py even if it is not in job.jar. However, the 
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to