[
https://issues.apache.org/jira/browse/SPARK-19627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-19627.
-------------------------------
Resolution: Invalid
Fix Version/s: (was: 1.6.1)
Target Version/s: (was: 1.6.1)
Please read http://spark.apache.org/contributing.html first
> pyspark call jvm function defined by ourselves
> ----------------------------------------------
>
> Key: SPARK-19627
> URL: https://issues.apache.org/jira/browse/SPARK-19627
> Project: Spark
> Issue Type: Bug
> Components: Deploy
> Affects Versions: 1.6.1
> Reporter: kehao
>
> hi, I have a question that pyspark couldn't execute suceess by call jvm's
> function defined by myself, please view the code below:
> from pyspark import SparkConf,SparkContext
> from py4j.java_gateway import java_import
> if __name__ == "__main__":
> # conf = SparkConf().setAppName("testing")
> # sc = SparkContext(conf=conf)
> sc = SparkContext(appName="Py4jTesting")
> def foo(x):
> java_import(sc._jvm, "Calculate")
> func = sc._jvm.Calculate()
> func.sqAdd(x)
> rdd = sc.parallelize([1, 2, 3])
> result = rdd.map(foo).collect()
> print("$$$$$$$$$$$$$$$$$$$$$$")
> print(result)
> the result shows as below ,who can help me?
> Traceback (most recent call last):
> File "/home/manager/data/software/mytest/kehao/driver.py", line 19, in
> <module>
> result = rdd.map(foo).collect()
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py",
> line 771, in collect
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2379, in _jrdd
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2299, in _prepare_for_python_RDD
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py",
> line 428, in dumps
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 646, in dumps
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 107, in dump
> File "/usr/lib/python3.4/pickle.py", line 412, in dump
> self.save(obj)
> File "/usr/lib/python3.4/pickle.py", line 479, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.4/pickle.py", line 744, in save_tuple
> save(element)
> File "/usr/lib/python3.4/pickle.py", line 479, in save
> f(self, obj) # Call unbound method with explicit self
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 199, in save_function
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 236, in save_function_tuple
> File "/usr/lib/python3.4/pickle.py", line 479, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.4/pickle.py", line 729, in save_tuple
> save(element)
> File "/usr/lib/python3.4/pickle.py", line 479, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.4/pickle.py", line 774, in save_list
> self._batch_appends(obj)
> File "/usr/lib/python3.4/pickle.py", line 801, in _batch_appends
> save(tmp[0])
> File "/usr/lib/python3.4/pickle.py", line 479, in save
> f(self, obj) # Call unbound method with explicit self
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 193, in save_function
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 241, in save_function_tuple
> File "/usr/lib/python3.4/pickle.py", line 479, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.4/pickle.py", line 814, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.4/pickle.py", line 840, in _batch_setitems
> save(v)
> File "/usr/lib/python3.4/pickle.py", line 499, in save
> rv = reduce(self.proto)
> File
> "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/context.py",
> line 268, in __getnewargs__
> Exception: It appears that you are attempting to reference SparkContext from
> a broadcast variable, action, or transformation. SparkContext can only be
> used on the driver, not in code that it run on workers. For more information,
> see SPARK-5063
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]