[ https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-2433: ------------------------------------ Fix Version/s: 0.12 Assignee: Rohini Palaniswamy Status: Patch Available (was: Open) > Jython import module not working if module path is in classpath > --------------------------------------------------------------- > > Key: PIG-2433 > URL: https://issues.apache.org/jira/browse/PIG-2433 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.10.0 > Reporter: Daniel Dai > Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-2433.patch > > > This is a hole of PIG-1824. If the path of python module is in classpath, job > die with the message could not instantiate > 'org.apache.pig.scripting.jython.JythonFunction'. > Here is my observation: > If the path of python module is in classpath, fileEntry we got in > JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the > script itself. Thus we cannot locate the script and skip the script in > job.xml. > For example: > {code} > register 'scriptB.py' using > org.apache.pig.scripting.jython.JythonScriptEngine as pig > A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long); > B = foreach A generate pig.square(a0); > dump B; > scriptB.py: > #!/usr/bin/python > import scriptA > @outputSchema("x:{t:(num:double)}") > def sqrt(number): > return (number ** .5) > @outputSchema("x:{t:(num:long)}") > def square(number): > return long(scriptA.square(number)) > scriptA.py: > #!/usr/bin/python > def square(number): > return (number * number) > {code} > When we register scriptB.py, we use jython library to figure out the > dependent modules scriptB relies on, in this case, scriptA. However, if > current directory is in classpath, instead of scriptA.py, we get > __pyclasspath__/scriptA.class. Then we try to put > __pyclasspath__/script$py.class into job.jar, Pig complains > __pyclasspath__/script$py.class does not exist. > This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop > 20.x, the test still success because MiniCluster will take local classpath so > it can still find scriptA.py even if it is not in job.jar. However, the > script will fail in real cluster and MiniMRYarnCluster of hadoop 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira