[ 
https://issues.apache.org/jira/browse/SPARK-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oksana Romankova updated SPARK-8697:
------------------------------------
    Comment: was deleted

(was: Spark 1.4.1

It seems like the issue happens when DataFrame is created frm existing RDD 
using toDF() and if RegexTokenizer is used to extract matches with 
setGaps(false). If you load the file from sqlContext.read.load this doesn't 
happen.

The exception is:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0.0 in stage 2.0 (TID 2) had a not serializable result: 
scala.util.matching.Regex$MatchIterator
Serialization stack:

        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
)

> MatchIterator not serializable exception in RegexTokenizer
> ----------------------------------------------------------
>
>                 Key: SPARK-8697
>                 URL: https://issues.apache.org/jira/browse/SPARK-8697
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Priority: Minor
>
> I'm not sure whether this is a real bug or not. In REPL, I saw MatchIterator 
> not serializable exception in RegexTokeinzer during some ad-hoc testing. 
> However, I couldn't reproduce this issue. Maybe it is a REPL bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to