[
https://issues.apache.org/jira/browse/PIG-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Noll updated PIG-2665:
------------------------------
Description:
Using Pig 0.9.0 I was running into PIG-1824 when using import statements (e.g.
{{import os}}) in a Python script with embedded Pig Latin. Dmitriy Ryaboy
pointed me to the new Pig 0.10 release candidate
(http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so
that I could test whether my issue was solved by the new Pig version. During
testing I run into the error described below.
*Summary (TL;DR)*
* Even a minimal Python script with embedded Pig Latin will throw an error if
there is a single import statement in the Python code.
* The fix is to replace the bundled {{lib/jython.jar}} with a standalone
version of the same jar.
*Error message: "ERROR 1121: Python Error (ImportError: No module named
<yourmodule>)"*
{code}
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
2012-04-24 11:20:44,224 [main] INFO org.apache.pig.Main - Apache Pig version
0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
[...snip...]
*sys-package-mgr*: can't create package cache dir,
'/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
2012-04-24 11:20:44,816 [main] INFO
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_4081589571886870123
2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python
Error. Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
{code}
In the Pig log file:
{code}
Error before Pig is launched
----------------------------
ERROR 1121: Python Error. Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error.
Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
at
org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
at
org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
at
org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: Traceback (most recent call last):
{code}
*How to reproduce*
Create a simple Python script that uses embedded Pig Latin AND that imports
Python standard modules (any import statement will work):
{code}
#!/usr/bin/python
from org.apache.pig.scripting import Pig
# this import statement will trigger the error;
# remove it and everything will work fine
import os
if __name__ == "__main__":
pig_script = """
set job.name 'Pig 0.10.0-RC1 Python test';
"""
P = Pig.compile(pig_script)
bound = P.bind()
result = bound.runSingle()
if result.isSuccessful() :
print "Pig job succeeded"
else:
raise "Pig job failed"
{code}
Then proceed as follows.
{code}
#
# Install the Pig 0.10.0 release candidate [1].
#
# run the Python test script
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
#
# see section above for error message
#
{code}
*Test Environment*
Apart from the "Environment" JIRA field please note that none of the
TaskTracker boxes in my test cluster has Pig or Jython installed. Pig with
Jython is only available on a gateway box from which analysis jobs are run.
*Bug description*
During my investigation I discovered that the {{jython.jar}} that is shipped
with the 0.10.0 RC package is NOT a standalone version of Jython. I compared
(diffed) the unpacked contents of the existing jython.jar with a standalone jar
for Jython 2.5.0, and noticed that the main difference is that the standalone
jar comes with a {{Lib/}} directory containing the various Python standard
modules:
{code}
$ diff -r jython2.5.0 jython2.5.0-standalone/
Only in jython2.5.0-standalone/: Lib
diff -r jython2.5.0/META-INF/MANIFEST.MF
jython2.5.0-standalone//META-INF/MANIFEST.MF
2a3
> Built-By: frank
5d5
< Built-By: frank
8,10d7
< version: 2.5.0
< svn-build: true
< oracle: true
11a9
> svn-build: true
13d10
< jdk-target-version: 1.5
14a12,14
> oracle: true
> version: 2.5.0
> jdk-target-version: 1.5
{code}
The essential difference is the missing {{Lib/}} directory in the
non-standalone jar.
{code}
$ ls -l jython2.5.0-standalone/Lib
total 5236
-rw-r--r-- 1 mnoll mnoll 33417 2012-04-24 09:28 aifc.py
-rw-r--r-- 1 mnoll mnoll 2620 2012-04-24 09:28 anydbm.py
-rw-r--r-- 1 mnoll mnoll 11347 2012-04-24 09:28 ast.py
-rw-r--r-- 1 mnoll mnoll 10764 2012-04-24 09:28 asynchat.py
-rw-r--r-- 1 mnoll mnoll 17276 2012-04-24 09:28 asyncore.py
-rw-r--r-- 1 mnoll mnoll 1631 2012-04-24 09:28 atexit.py
-rw-r--r-- 1 mnoll mnoll 11296 2012-04-24 09:28 base64.py
-rw-r--r-- 1 mnoll mnoll 21289 2012-04-24 09:28 BaseHTTPServer.py
-rw-r--r-- 1 mnoll mnoll 20143 2012-04-24 09:28 bdb.py
[...snip...]
{code}
Apparently Jython (and thereby Pig) requires these Python module filesto be
included in the {{jython.jar}} file -- at least in cluster environments where
TaskTrackers DO NOT have Pig or Jython installed.
*How to fix*
In the Pig release package replace the {{jython.jar}} in {{lib/}} with a
standalone version of the same jar.
Here's how I creatd the standalone version of Jython 2.5.0 on my box:
{code}
$ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t standalone
-j $JAVA_HOME
{code}
This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.
Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing
(non-standalone) version. After that the Python test script above will work
successfully.
For completeness I also want to mention that I observed the following WARN
messages before and after the Pig job was actually executed in the cluster:
{code}
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
[...snipp...]
# before job submission
#
2012-04-24 14:16:58,463 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped,
jython may not work
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os.path,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: posixpath,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,468 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
# after the job finished (and succeeded)
#
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os.path,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: posixpath,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
{code}
*Jython 2.5.0 vs. Jython 2.5.2*
FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as bundled
with the Pig 0.10 RC package) changes the results. It did not. That is, the
Python script fails with non-standalone 2.5.2 jar but works with the standalone
2.5.2 jar.
Best,
Michael
PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest
stable release 2.5.2?
PPS: The 0.10.0-RC did solve my original PIG-1824 problem. I could run the
problematic Python/Pig script successfully using the 0.10.0-RC with a
standalone Jython 2.5.0 jar. Cool!
[1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz
was:
Using Pig 0.9.0 I was running into PIG-1824 when using import statements (e.g.
{{import os}}) in a Python script with embedded Pig Latin. Dmitriy Ryaboy
pointed me to the new Pig 0.10 release candidate
(http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so
that I could test whether my issue was solved by the new Pig version. During
testing I run into the error described below.
*Summary (TL;DR)*
* Even a minimal Python script with embedded Pig Latin will throw an error if
there is a single import statement in the Python code.
* The fix is to replace the bundled {{lib/jython.jar}} with a standalone
version of the same jar.
*Error message: "ERROR 1121: Python Error (ImportError: No module named
<yourmodule>)"*
{code}
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
2012-04-24 11:20:44,224 [main] INFO org.apache.pig.Main - Apache Pig version
0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
[...snip...]
*sys-package-mgr*: can't create package cache dir,
'/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
2012-04-24 11:20:44,816 [main] INFO
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_4081589571886870123
2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python
Error. Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
{code}
In the Pig log file:
{code}
Error before Pig is launched
----------------------------
ERROR 1121: Python Error. Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error.
Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
at
org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
at
org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
at
org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: Traceback (most recent call last):
{code}
*How to reproduce*
Create a simple Python script that uses embedded Pig Latin AND that imports
Python standard modules (any import statement will work):
{code}
#!/usr/bin/python
from org.apache.pig.scripting import Pig
# this import statement will trigger the error;
# remove it and everything will work fine
import os
if __name__ == "__main__":
pig_script = """
set job.name 'Pig 0.10.0-RC1 Python test';
"""
P = Pig.compile(pig_script)
bound = P.bind()
result = bound.runSingle()
if result.isSuccessful() :
print "Pig job succeeded"
else:
raise "Pig job failed"
{code}
Then proceed as follows.
{code}
#
# Install the Pig 0.10.0 release candidate [1].
#
# run the Python test script
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
#
# see section above for error message
#
{code}
*Test Environment*
Apart from the "Environment" JIRA field please note that none of the
TaskTracker boxes in my test cluster has Pig or Jython installed. Pig with
Jython is only available on a gateway box from which analysis jobs are run.
*Bug description*
During my investigation I discovered that the {{jython.jar}} that is shipped
with the 0.10.0 RC package is NOT a standalone version of Jython. I compared
(diffed) the unpacked contents of the existing jython.jar with a standalone jar
for Jython 2.5.0, and noticed that the main difference is that the standalone
jar comes with a {{Lib/}} directory containing the various Python standard
modules:
{code}
$ diff -r jython2.5.0 jython2.5.0-standalone/
Only in jython2.5.0-standalone/: Lib
diff -r jython2.5.0/META-INF/MANIFEST.MF
jython2.5.0-standalone//META-INF/MANIFEST.MF
2a3
> Built-By: frank
5d5
< Built-By: frank
8,10d7
< version: 2.5.0
< svn-build: true
< oracle: true
11a9
> svn-build: true
13d10
< jdk-target-version: 1.5
14a12,14
> oracle: true
> version: 2.5.0
> jdk-target-version: 1.5
$ ls -l jython2.5.0-standalone/Lib
total 5236
-rw-r--r-- 1 mnoll mnoll 33417 2012-04-24 09:28 aifc.py
-rw-r--r-- 1 mnoll mnoll 2620 2012-04-24 09:28 anydbm.py
-rw-r--r-- 1 mnoll mnoll 11347 2012-04-24 09:28 ast.py
-rw-r--r-- 1 mnoll mnoll 10764 2012-04-24 09:28 asynchat.py
-rw-r--r-- 1 mnoll mnoll 17276 2012-04-24 09:28 asyncore.py
-rw-r--r-- 1 mnoll mnoll 1631 2012-04-24 09:28 atexit.py
-rw-r--r-- 1 mnoll mnoll 11296 2012-04-24 09:28 base64.py
-rw-r--r-- 1 mnoll mnoll 21289 2012-04-24 09:28 BaseHTTPServer.py
-rw-r--r-- 1 mnoll mnoll 20143 2012-04-24 09:28 bdb.py
[...snip...]
{code}
Apparently Jython (and thereby Pig) requires these Python module filesto be
included in the {{jython.jar}} file -- at least in cluster environments where
TaskTrackers DO NOT have Pig or Jython installed.
*How to fix*
In the Pig release package replace the {{jython.jar}} in {{lib/}} with a
standalone version of the same jar.
Here's how I creatd the standalone version of Jython 2.5.0 on my box:
{code}
$ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t standalone
-j $JAVA_HOME
{code}
This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.
Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing
(non-standalone) version. After that the Python test script above will work
successfully.
For completeness I also want to mention that I observed the following WARN
messages before and after the Pig job was actually executed in the cluster:
{code}
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
[...snipp...]
# before job submission
#
2012-04-24 14:16:58,463 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped,
jython may not work
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os.path,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: posixpath,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,468 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
# after the job finished (and succeeded)
#
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os.path,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: posixpath,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
{code}
*Jython 2.5.0 vs. Jython 2.5.2*
FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as bundled
with the Pig 0.10 RC package) changes the results. It did not. That is, the
Python script fails with non-standalone 2.5.2 jar but works with the standalone
2.5.2 jar.
Best,
Michael
PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest
stable release 2.5.2?
PPS: The 0.10.0-RC did solve my original PIG-1824 problem. I could run the
problematic Python/Pig script successfully using the 0.10.0-RC with a
standalone Jython 2.5.0 jar. Cool!
[1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz
> Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts
> with embedded Pig Latin
> --------------------------------------------------------------------------------------------------
>
> Key: PIG-2665
> URL: https://issues.apache.org/jira/browse/PIG-2665
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.10.0
> Environment: Verified bug on RHEL6 and on Ubuntu 11.10 with Sun JDK
> 1.6, and both Jython 2.5.0 (shipped with the Pig 0.10.0 RC package) and
> Jython 2.5.2.
> Reporter: Michael Noll
>
> Using Pig 0.9.0 I was running into PIG-1824 when using import statements
> (e.g. {{import os}}) in a Python script with embedded Pig Latin. Dmitriy
> Ryaboy pointed me to the new Pig 0.10 release candidate
> (http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so
> that I could test whether my issue was solved by the new Pig version. During
> testing I run into the error described below.
> *Summary (TL;DR)*
> * Even a minimal Python script with embedded Pig Latin will throw an error if
> there is a single import statement in the Python code.
> * The fix is to replace the bundled {{lib/jython.jar}} with a standalone
> version of the same jar.
> *Error message: "ERROR 1121: Python Error (ImportError: No module named
> <yourmodule>)"*
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
> 2012-04-24 11:20:44,224 [main] INFO org.apache.pig.Main - Apache Pig version
> 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
> [...snip...]
> *sys-package-mgr*: can't create package cache dir,
> '/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
> 2012-04-24 11:20:44,816 [main] INFO
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4081589571886870123
> 2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python
> Error. Traceback (most recent call last):
> File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
> import os
> ImportError: No module named os
> {code}
> In the Pig log file:
> {code}
> Error before Pig is launched
> ----------------------------
> ERROR 1121: Python Error. Traceback (most recent call last):
> File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
> import os
> ImportError: No module named os
> org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python
> Error. Traceback (most recent call last):
> File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
> import os
> ImportError: No module named os
> at
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
> at org.apache.pig.Main.run(Main.java:510)
> at org.apache.pig.Main.main(Main.java:111)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: Traceback (most recent call last):
> {code}
> *How to reproduce*
> Create a simple Python script that uses embedded Pig Latin AND that imports
> Python standard modules (any import statement will work):
> {code}
> #!/usr/bin/python
> from org.apache.pig.scripting import Pig
> # this import statement will trigger the error;
> # remove it and everything will work fine
> import os
> if __name__ == "__main__":
> pig_script = """
> set job.name 'Pig 0.10.0-RC1 Python test';
> """
> P = Pig.compile(pig_script)
> bound = P.bind()
> result = bound.runSingle()
> if result.isSuccessful() :
> print "Pig job succeeded"
> else:
> raise "Pig job failed"
> {code}
> Then proceed as follows.
> {code}
> #
> # Install the Pig 0.10.0 release candidate [1].
> #
> # run the Python test script
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
> #
> # see section above for error message
> #
> {code}
> *Test Environment*
> Apart from the "Environment" JIRA field please note that none of the
> TaskTracker boxes in my test cluster has Pig or Jython installed. Pig with
> Jython is only available on a gateway box from which analysis jobs are run.
> *Bug description*
> During my investigation I discovered that the {{jython.jar}} that is shipped
> with the 0.10.0 RC package is NOT a standalone version of Jython. I compared
> (diffed) the unpacked contents of the existing jython.jar with a standalone
> jar for Jython 2.5.0, and noticed that the main difference is that the
> standalone jar comes with a {{Lib/}} directory containing the various Python
> standard modules:
> {code}
> $ diff -r jython2.5.0 jython2.5.0-standalone/
> Only in jython2.5.0-standalone/: Lib
> diff -r jython2.5.0/META-INF/MANIFEST.MF
> jython2.5.0-standalone//META-INF/MANIFEST.MF
> 2a3
> > Built-By: frank
> 5d5
> < Built-By: frank
> 8,10d7
> < version: 2.5.0
> < svn-build: true
> < oracle: true
> 11a9
> > svn-build: true
> 13d10
> < jdk-target-version: 1.5
> 14a12,14
> > oracle: true
> > version: 2.5.0
> > jdk-target-version: 1.5
> {code}
> The essential difference is the missing {{Lib/}} directory in the
> non-standalone jar.
> {code}
> $ ls -l jython2.5.0-standalone/Lib
> total 5236
> -rw-r--r-- 1 mnoll mnoll 33417 2012-04-24 09:28 aifc.py
> -rw-r--r-- 1 mnoll mnoll 2620 2012-04-24 09:28 anydbm.py
> -rw-r--r-- 1 mnoll mnoll 11347 2012-04-24 09:28 ast.py
> -rw-r--r-- 1 mnoll mnoll 10764 2012-04-24 09:28 asynchat.py
> -rw-r--r-- 1 mnoll mnoll 17276 2012-04-24 09:28 asyncore.py
> -rw-r--r-- 1 mnoll mnoll 1631 2012-04-24 09:28 atexit.py
> -rw-r--r-- 1 mnoll mnoll 11296 2012-04-24 09:28 base64.py
> -rw-r--r-- 1 mnoll mnoll 21289 2012-04-24 09:28 BaseHTTPServer.py
> -rw-r--r-- 1 mnoll mnoll 20143 2012-04-24 09:28 bdb.py
> [...snip...]
> {code}
> Apparently Jython (and thereby Pig) requires these Python module filesto be
> included in the {{jython.jar}} file -- at least in cluster environments where
> TaskTrackers DO NOT have Pig or Jython installed.
> *How to fix*
> In the Pig release package replace the {{jython.jar}} in {{lib/}} with a
> standalone version of the same jar.
> Here's how I creatd the standalone version of Jython 2.5.0 on my box:
> {code}
> $ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t
> standalone -j $JAVA_HOME
> {code}
> This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.
> Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing
> (non-standalone) version. After that the Python test script above will work
> successfully.
> For completeness I also want to mention that I observed the following WARN
> messages before and after the Pig job was actually executed in the cluster:
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
> [...snipp...]
> # before job submission
> #
> 2012-04-24 14:16:58,463 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped,
> jython may not work
> 2012-04-24 14:16:58,467 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,467 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os.path,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,467 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: posixpath,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,468 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: stat,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> # after the job finished (and succeeded)
> #
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os.path,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: posixpath,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: stat,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> {code}
> *Jython 2.5.0 vs. Jython 2.5.2*
> FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as
> bundled with the Pig 0.10 RC package) changes the results. It did not. That
> is, the Python script fails with non-standalone 2.5.2 jar but works with the
> standalone 2.5.2 jar.
> Best,
> Michael
> PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest
> stable release 2.5.2?
> PPS: The 0.10.0-RC did solve my original PIG-1824 problem. I could run the
> problematic Python/Pig script successfully using the 0.10.0-RC with a
> standalone Jython 2.5.0 jar. Cool!
> [1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira