[
https://issues.apache.org/jira/browse/SPARK-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen updated SPARK-4851:
------------------------------
Priority: Minor (was: Major)
Description:
*Reproduction:*
{code}
class A:
@staticmethod
def foo(self, x):
return x
sc.parallelize([1]).map(lambda x: A.foo(x)).count()
{code}
This gives
{code}
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID
3, localhost): org.apache.spark.api.python.PythonException: Traceback (most
recent call last):
File "/Users/joshrosen/Documents/Spark/python/pyspark/worker.py", line 107,
in main
process()
File "/Users/joshrosen/Documents/Spark/python/pyspark/worker.py", line 98, in
process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 2070, in
pipeline_func
return func(split, prev_func(split, iterator))
File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 2070, in
pipeline_func
return func(split, prev_func(split, iterator))
File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 2070, in
pipeline_func
return func(split, prev_func(split, iterator))
File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 247, in
func
return f(iterator)
File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 818, in
<lambda>
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 818, in
<genexpr>
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "<stdin>", line 1, in <lambda>
RuntimeError: uninitialized staticmethod object
at
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:136)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:173)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:264)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:231)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745){code}
was:
class A:
@staticmethod
def calc_something(self, x):
return x*x
rdd = sc.parallelize(xrange(1,1000),60)
rdd.count()
rdd.map(lambda x:A.calc_something(x))
error:
RuntimeError: uninitialized staticmethod object
Environment: (was: mac Os )
Affects Version/s: (was: 1.0.0)
1.3.0
1.0.2
1.2.0
Labels: (was: newbie)
Summary: "Uninitialized staticmethod object" error in PySpark
(was: can't pass a static method to rdd object in python )
I've edited this issue to include an updated reproduction and more complete
stacktrace.
I'm pretty sure that this is due to limitations in our pickling library. Dill,
another Python pickling extension, also has trouble with this case and it
doesn't appear that they have a fix either:
https://github.com/uqfoundation/dill/issues/13
> "Uninitialized staticmethod object" error in PySpark
> ----------------------------------------------------
>
> Key: SPARK-4851
> URL: https://issues.apache.org/jira/browse/SPARK-4851
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.0.2, 1.2.0, 1.3.0
> Reporter: Nadav Grossug
> Priority: Minor
>
> *Reproduction:*
> {code}
> class A:
> @staticmethod
> def foo(self, x):
> return x
> sc.parallelize([1]).map(lambda x: A.foo(x)).count()
> {code}
> This gives
> {code}
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0
> (TID 3, localhost): org.apache.spark.api.python.PythonException: Traceback
> (most recent call last):
> File "/Users/joshrosen/Documents/Spark/python/pyspark/worker.py", line 107,
> in main
> process()
> File "/Users/joshrosen/Documents/Spark/python/pyspark/worker.py", line 98,
> in process
> serializer.dump_stream(func(split_index, iterator), outfile)
> File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 2070,
> in pipeline_func
> return func(split, prev_func(split, iterator))
> File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 2070,
> in pipeline_func
> return func(split, prev_func(split, iterator))
> File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 2070,
> in pipeline_func
> return func(split, prev_func(split, iterator))
> File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 247, in
> func
> return f(iterator)
> File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 818, in
> <lambda>
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
> File "/Users/joshrosen/Documents/Spark/python/pyspark/rdd.py", line 818, in
> <genexpr>
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
> File "<stdin>", line 1, in <lambda>
> RuntimeError: uninitialized staticmethod object
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:136)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:173)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:95)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:264)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:231)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745){code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]