[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example

Sean Owen (JIRA) Wed, 24 Sep 2014 00:38:45 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146017#comment-14146017
 ]


Sean Owen commented on SPARK-3662:
----------------------------------

Maybe I miss something, but, does this just mean you can't "import pandas" 
entirely? If you're modifying the example, you should import only what you need 
from pandas. Or, it may be that you need to modify the "import random", indeed, 
to accommodate other modifications you want to make.

But what is the problem with the included example? it runs fine without 
modifications, no?

> Importing pandas breaks included pi.py example
> ----------------------------------------------
>
>                 Key: SPARK-3662
>                 URL: https://issues.apache.org/jira/browse/SPARK-3662
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.1.0
>         Environment: Xubuntu 14.04.  Yarn cluster running on Ubuntu 12.04.
>            Reporter: Evan Samanas
>
> If I add "import pandas" at the top of the included pi.py example and submit 
> using "spark-submit --master yarn-client", I get this stack trace:
> {code}
> Traceback (most recent call last):
>   File "/home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py", line 
> 39, in <module>
>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>   File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 759, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 723, in collect
>     bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task 
> 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: 
> org.apache.spark.api.python.PythonException (Traceback (most recent call 
> last):
>   File 
> "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py",
>  line 75, in main
>     command = pickleSer._read_with_length(infile)
>   File 
> "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py",
>  line 150, in _read_with_length
>     return self.loads(obj)
> ImportError: No module named algos
> {code}
> The example works fine if I move the statement "from random import random" 
> from the top and into the function (def f(_)) defined in the example.  Near 
> as I can tell, "random" is getting confused with a function of the same name 
> within pandas.algos.  
> Submitting the same script using --master local works, but gives a 
> distressing amount of random characters to stdout or stderr and messes up my 
> terminal:
> {code}
> ...
> @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J 
> @J!@J"@J#@J$@J%@J&@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J<@J=@J>@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ
>        AJ
> AJ
>   AJ
> AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ 
> AJ!AJ"AJ#AJ$AJ%AJ&AJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJ<AJ=AJ>AJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23
>  15:42:09 INFO SparkContext: Job finished: reduce at 
> /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took 
> 11.276879779 s
> J�AJ�AJ�AJ�AJ�AJ�AJ�AJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ
>      BJ
> BJ
>   BJ
> BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ 
> BJ!BJ"BJ#BJ$BJ%BJ&BJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJ<BJ=BJ>BJ?BJ@Be.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJ#1a.
> �]qJX4a.
> �]qJX4a.
> �]qJa.
> Pi is roughly 3.146136
> {code}
> No idea if that's related, but thought I'd include it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example

Reply via email to