[
https://issues.apache.org/jira/browse/SPARK-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oksana Romankova updated SPARK-8697:
------------------------------------
Comment: was deleted
(was: Spark 1.4.1
It seems like the issue happens when DataFrame is created frm existing RDD
using toDF() and if RegexTokenizer is used to extract matches with
setGaps(false). If you load the file from sqlContext.read.load this doesn't
happen.
The exception is:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to
stage failure: Task 0.0 in stage 2.0 (TID 2) had a not serializable result:
scala.util.matching.Regex$MatchIterator
Serialization stack:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
)
> MatchIterator not serializable exception in RegexTokenizer
> ----------------------------------------------------------
>
> Key: SPARK-8697
> URL: https://issues.apache.org/jira/browse/SPARK-8697
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 1.4.0
> Reporter: Xiangrui Meng
> Priority: Minor
>
> I'm not sure whether this is a real bug or not. In REPL, I saw MatchIterator
> not serializable exception in RegexTokeinzer during some ad-hoc testing.
> However, I couldn't reproduce this issue. Maybe it is a REPL bug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]