hi Andrew, Thanks for your note. Yes, I see a stack trace now. It seems to be an issue with python interpreting a function I wish to apply to an RDD. The stack trace is below. The function is a simple factorial:
def f(n): if n == 1: return 1 return n * f(n-1) and I'm trying to use it like this: tf = sc.textFile(...) tf.map(lambda line: line and len(line)).map(f).collect() I get the following error, which does not occur if I use a built-in function, like math.sqrt TypeError: __import__() argument 1 must be string, not X# stacktrace follows WARN TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/worker.py", line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py", line 191, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py", line 123, in dump_stream for obj in iterator: File "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py", line 180, in _batched for item in iterator: File "<ipython-input-39-0f0dafaf1ed4>", line 2, in f TypeError: __import__() argument 1 must be string, not X# at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) On Wed, Jul 23, 2014 at 11:07 AM, Andrew Or <and...@databricks.com> wrote: > Hi Eric, > > Have you checked the executor logs? It is possible they died because of > some exception, and the message you see is just a side effect. > > Andrew > > > 2014-07-23 8:27 GMT-07:00 Eric Friedman <eric.d.fried...@gmail.com>: > > I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc. >> Cluster resources are available to me via Yarn and I am seeing these >> errors quite often. >> >> ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote Akka >> client disassociated >> >> >> This is in an interactive shell session. I don't know a lot about Yarn >> plumbing and am wondering if there's some constraint in play -- executors >> can't be idle for too long or they get cleared out. >> >> >> Any insights here? >> > >