[
https://issues.apache.org/jira/browse/SPARK-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Halit Olali updated SPARK-5146:
-------------------------------
Description:
Hello folks,
Consider the following simple app for word counting via network socket:
{code:title=WordCount.scala|borderStyle=solid}
val conf = new SparkConf().setAppName("Sample Application")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(5))
ssc.checkpoint("target/checkpointDir")
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.saveAsHadoopFiles("target/prefix","suffix")
//nc -lk 9999
ssc.start()
ssc.awaitTermination(60)
{code}
When this is packaged and executed on spark, following exception is thrown:
java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf
JobConf usage inside saveAsHadoopFiles methods seems to be the cause.
Thanks
was:
Hello folks,
Consider the following simple app for word counting via network socket:
val conf = new SparkConf().setAppName("Sample Application")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(5))
ssc.checkpoint("target/checkpointDir")
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.saveAsHadoopFiles("target/prefix","suffix")
//nc -lk 9999
ssc.start()
ssc.awaitTermination(60)
When this is packaged and executed on spark, following exception is thrown:
java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf
JobConf usage inside saveAsHadoopFiles methods seems to be the cause.
Thanks
> saveAsHadoopFiles does not work with checkpointing
> --------------------------------------------------
>
> Key: SPARK-5146
> URL: https://issues.apache.org/jira/browse/SPARK-5146
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Halit Olali
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Hello folks,
> Consider the following simple app for word counting via network socket:
> {code:title=WordCount.scala|borderStyle=solid}
> val conf = new SparkConf().setAppName("Sample Application")
> val sc = new SparkContext(conf)
> val ssc = new StreamingContext(sc, Seconds(5))
> ssc.checkpoint("target/checkpointDir")
> val lines = ssc.socketTextStream("localhost", 9999)
> val words = lines.flatMap(_.split(" "))
> val pairs = words.map(word => (word, 1))
> val wordCounts = pairs.reduceByKey(_ + _)
> wordCounts.saveAsHadoopFiles("target/prefix","suffix")
> //nc -lk 9999
> ssc.start()
> ssc.awaitTermination(60)
> {code}
> When this is packaged and executed on spark, following exception is thrown:
> java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf
> JobConf usage inside saveAsHadoopFiles methods seems to be the cause.
> Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]