[ 
https://issues.apache.org/jira/browse/SPARK-32673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181526#comment-17181526
 ] 

Sandy Su commented on SPARK-32673:
----------------------------------

df_signals = df_record_names.repartition('record_name').select(
 df_record_names.record_id,
 extract_signals_udf(df_record_names.record_name).alias('signal_info'))

df_signals = df_signals.select(df_signals.record_id,
 df_signals.signal_info.patient_id.alias('patient_id'),
 df_signals.signal_info.comments.alias('comments'),
 df_signals.signal_info.signals.alias('signals'))

display(df_signals.drop('signals'))

> Pyspark/cloudpickle.py - no module named 'wfdb'
> -----------------------------------------------
>
>                 Key: SPARK-32673
>                 URL: https://issues.apache.org/jira/browse/SPARK-32673
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Sandy Su
>            Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Running Spark in a Databricks notebook.
>  
> Ran into this issue when executing a cell:
> (1) Spark Jobs
> SparkException: Job aborted due to stage failure: Task 0 in stage 17.0 failed 
> 4 times, most recent failure: Lost task 0.3 in stage 17.0 (TID 68, 
> 10.139.64.5, executor 0): org.apache.spark.api.python.PythonException: 
> Traceback (most recent call last): File 
> "/databricks/spark/python/pyspark/serializers.py", line 177, in 
> _read_with_length return self.loads(obj) File 
> "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return 
> pickle.loads(obj, encoding=encoding) File 
> "/databricks/spark/python/pyspark/cloudpickle.py", line 1110, in subimport 
> __import__(name) ModuleNotFoundError: No module named 'wfdb' During handling 
> of the above exception, another exception occurred: Traceback (most recent 
> call last): File "/databricks/spark/python/pyspark/worker.py", line 644, in 
> main func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, 
> eval_type) File "/databricks/spark/python/pyspark/worker.py", line 463, in 
> read_udfs udfs.append(read_single_udf(pickleSer, infile, eval_type, 
> runner_conf, udf_index=i)) File "/databricks/spark/python/pyspark/worker.py", 
> line 254, in read_single_udf f, return_type = read_command(pickleSer, infile) 
> File "/databricks/spark/python/pyspark/worker.py", line 74, in read_command 
> command = serializer._read_with_length(file) File 
> "/databricks/spark/python/pyspark/serializers.py", line 180, in 
> _read_with_length raise SerializationError("Caused by " + 
> traceback.format_exc()) pyspark.serializers.SerializationError: Caused by 
> Traceback (most recent call last): File 
> "/databricks/spark/python/pyspark/serializers.py", line 177, in 
> _read_with_length return self.loads(obj) File 
> "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return 
> pickle.loads(obj, encoding=encoding) File 
> "/databricks/spark/python/pyspark/cloudpickle.py", line 1110, in subimport 
> __import__(name) ModuleNotFoundError: No module named 'wfdb'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to