[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example

Sean Owen (JIRA) Mon, 13 Oct 2014 08:50:06 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169424#comment-14169424
 ]


Sean Owen commented on SPARK-3662:
----------------------------------

[~esamanas] Do you have a suggested change here, beyond just disambiguating 
imports in your example? Or a different example that doesn't involve import 
collision? It sounds like the modified example is then misunderstood to refer 
to a pandas random class, not the Python one, and that is simply a matter of 
namespace collision, and why pandas is dragged in. This example seems to fall 
down before it demonstrates anything else.

> Importing pandas breaks included pi.py example
> ----------------------------------------------
>
>                 Key: SPARK-3662
>                 URL: https://issues.apache.org/jira/browse/SPARK-3662
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.1.0
>         Environment: Xubuntu 14.04.  Yarn cluster running on Ubuntu 12.04.
>            Reporter: Evan Samanas
>
> If I add "import pandas" at the top of the included pi.py example and submit 
> using "spark-submit --master yarn-client", I get this stack trace:
> {code}
> Traceback (most recent call last):
>   File "/home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py", line 
> 39, in <module>
>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>   File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 759, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 723, in collect
>     bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task 
> 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: 
> org.apache.spark.api.python.PythonException (Traceback (most recent call 
> last):
>   File 
> "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py",
>  line 75, in main
>     command = pickleSer._read_with_length(infile)
>   File 
> "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py",
>  line 150, in _read_with_length
>     return self.loads(obj)
> ImportError: No module named algos
> {code}
> The example works fine if I move the statement "from random import random" 
> from the top and into the function (def f(_)) defined in the example.  Near 
> as I can tell, "random" is getting confused with a function of the same name 
> within pandas.algos.  
> Submitting the same script using --master local works, but gives a 
> distressing amount of random characters to stdout or stderr and messes up my 
> terminal:
> {code}
> ...
> @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J 
> @J!@J"@J#@J$@J%@J&@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J<@J=@J>@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ
>        AJ
> AJ
>   AJ
> AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ 
> AJ!AJ"AJ#AJ$AJ%AJ&AJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJ<AJ=AJ>AJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23
>  15:42:09 INFO SparkContext: Job finished: reduce at 
> /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took 
> 11.276879779 s
> J�AJ�AJ�AJ�AJ�AJ�AJ�AJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ
>      BJ
> BJ
>   BJ
> BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ 
> BJ!BJ"BJ#BJ$BJ%BJ&BJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJ<BJ=BJ>BJ?BJ@Be.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJa.
> Pi is roughly 3.146136
> {code}
> No idea if that's related, but thought I'd include it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example

Reply via email to