[
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546192#comment-13546192
]
Cheolsoo Park commented on PIG-2433:
------------------------------------
I also turned on DEBUG as per your request, so you can see extra debug messages
in the log files.
> Jython import module not working if module path is in classpath
> ---------------------------------------------------------------
>
> Key: PIG-2433
> URL: https://issues.apache.org/jira/browse/PIG-2433
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: bad.log, good.log, PIG-2433.patch,
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job
> die with the message could not instantiate
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the
> script itself. Thus we cannot locate the script and skip the script in
> job.xml.
> For example:
> {code}
> register 'scriptB.py' using
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
> return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
> return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
> return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the
> dependent modules scriptB relies on, in this case, scriptA. However, if
> current directory is in classpath, instead of scriptA.py, we get
> __pyclasspath__/scriptA.class. Then we try to put
> __pyclasspath__/script$py.class into job.jar, Pig complains
> __pyclasspath__/script$py.class does not exist.
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop
> 20.x, the test still success because MiniCluster will take local classpath so
> it can still find scriptA.py even if it is not in job.jar. However, the
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira