[ 
https://issues.apache.org/jira/browse/SPARK-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Halit Olali updated SPARK-5146:
-------------------------------
    Description: 
Hello folks,
Consider the following simple app for word counting via network socket:
{code:title=WordCount.scala|borderStyle=solid}
    val conf = new SparkConf().setAppName("Sample Application")
    val sc = new SparkContext(conf)
    val ssc = new StreamingContext(sc, Seconds(5))
    ssc.checkpoint("target/checkpointDir")
    val lines = ssc.socketTextStream("localhost", 9999)
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.saveAsHadoopFiles("target/prefix","suffix")
    //nc -lk 9999
    ssc.start()
    ssc.awaitTermination(60)
{code}
When this is packaged and executed on spark, following exception is thrown:
java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf

JobConf usage inside saveAsHadoopFiles methods seems to be the cause.

Thanks


  was:
Hello folks,
Consider the following simple app for word counting via network socket:

    val conf = new SparkConf().setAppName("Sample Application")
    val sc = new SparkContext(conf)
    val ssc = new StreamingContext(sc, Seconds(5))
    ssc.checkpoint("target/checkpointDir")
    val lines = ssc.socketTextStream("localhost", 9999)
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.saveAsHadoopFiles("target/prefix","suffix")
    //nc -lk 9999
    ssc.start()
    ssc.awaitTermination(60)

When this is packaged and executed on spark, following exception is thrown:
java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf

JobConf usage inside saveAsHadoopFiles methods seems to be the cause.

Thanks



> saveAsHadoopFiles does not work with checkpointing
> --------------------------------------------------
>
>                 Key: SPARK-5146
>                 URL: https://issues.apache.org/jira/browse/SPARK-5146
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Halit Olali
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Hello folks,
> Consider the following simple app for word counting via network socket:
> {code:title=WordCount.scala|borderStyle=solid}
>     val conf = new SparkConf().setAppName("Sample Application")
>     val sc = new SparkContext(conf)
>     val ssc = new StreamingContext(sc, Seconds(5))
>     ssc.checkpoint("target/checkpointDir")
>     val lines = ssc.socketTextStream("localhost", 9999)
>     val words = lines.flatMap(_.split(" "))
>     val pairs = words.map(word => (word, 1))
>     val wordCounts = pairs.reduceByKey(_ + _)
>     wordCounts.saveAsHadoopFiles("target/prefix","suffix")
>     //nc -lk 9999
>     ssc.start()
>     ssc.awaitTermination(60)
> {code}
> When this is packaged and executed on spark, following exception is thrown:
> java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf
> JobConf usage inside saveAsHadoopFiles methods seems to be the cause.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to