[
https://issues.apache.org/jira/browse/SPARK-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232598#comment-14232598
]
Yu Ishikawa commented on SPARK-3910:
------------------------------------
I had had the same problem like Tomohiko. However, I resolved this, removing
all *.pyc under the `python/` directory.
{noformat}
cd $SPARK_HOME && find python -name "*.pyc" -delete
{noformat}
If it is true to solve this problem as I said. In my opinion, there are two
ways to resolve this issue.
1. remove all `*.pyc` under the `python` directory when running
`python/run-tests` at least
2. resolve the cyclic import
thanks
> ./python/pyspark/mllib/classification.py doctests fails with module name
> pollution
> ----------------------------------------------------------------------------------
>
> Key: SPARK-3910
> URL: https://issues.apache.org/jira/browse/SPARK-3910
> Project: Spark
> Issue Type: Sub-task
> Components: PySpark
> Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20,
> Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3,
> argparse==1.2.1, docutils==0.12, flake8==2.2.3, mccabe==0.2.1, numpy==1.9.0,
> pep8==1.5.7, psutil==2.1.3, pyflake8==0.1.9, pyflakes==0.8.1,
> unittest2==0.5.1, wsgiref==0.1.2
> Reporter: Tomohiko K.
> Labels: pyspark, testing
>
> In ./python/run-tests script, we run the doctests in
> ./pyspark/mllib/classification.py.
> The output is as following:
> {noformat}
> $ ./python/run-tests
> ...
> Running test: pyspark/mllib/classification.py
> Traceback (most recent call last):
> File "pyspark/mllib/classification.py", line 20, in <module>
> import numpy
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/__init__.py",
> line 170, in <module>
> from . import add_newdocs
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/add_newdocs.py",
> line 13, in <module>
> from numpy.lib import add_newdoc
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/__init__.py",
> line 8, in <module>
> from .type_check import *
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/type_check.py",
> line 11, in <module>
> import numpy.core.numeric as _nx
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/core/__init__.py",
> line 46, in <module>
> from numpy.testing import Tester
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/__init__.py",
> line 13, in <module>
> from .utils import *
> File
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/utils.py",
> line 15, in <module>
> from tempfile import mkdtemp
> File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tempfile.py",
> line 34, in <module>
> from random import Random as _Random
> File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/mllib/random.py",
> line 24, in <module>
> from pyspark.rdd import RDD
> File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/__init__.py", line
> 51, in <module>
> from pyspark.context import SparkContext
> File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/context.py", line
> 22, in <module>
> from tempfile import NamedTemporaryFile
> ImportError: cannot import name NamedTemporaryFile
> 0.07 real 0.04 user 0.02 sys
> Had test failures; see logs.
> {noformat}
> The problem is a cyclic import of tempfile module.
> The cause of it is that pyspark.mllib.random module exists in the directory
> where pyspark.mllib.classification module exists.
> classification module imports numpy module, and then numpy module imports
> tempfile module from its inside.
> Now the first entry sys.path is the directory "./python/pyspark/mllib" (where
> the executed file "classification.py" exists), so tempfile module imports
> pyspark.mllib.random module (not the standard library "random" module).
> Finally, import chains reach tempfile again, then a cyclic import is formed.
> Summary: classification → numpy → tempfile → pyspark.mllib.random → tempfile
> → (cyclic import!!)
> Furthermore, stat module is in a standard library, and pyspark.mllib.stat
> module exists. This also may be troublesome.
> commit: 0e8203f4fb721158fb27897680da476174d24c4b
> A fundamental solution is to avoid using module names used by standard
> libraries (currently "random" and "stat").
> A difficulty of this solution is to rename pyspark.mllib.random and
> pyspark.mllib.stat, which may be already used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]