Re: Task not serializable: java.io.NotSerializableException: org.json4s.Serialization$$anon$1

2016-07-19 Thread joshuata
It looks like the problem is that the parse function non-serializeable. This is most likely because the formats variable is local to the ParseJson object, and therefore not globally accessible to the cluster. Generally this problem can be solved by moving the variable inside the closure so that it

Re: how to setup the development environment of spark with IntelliJ on ubuntu

2016-07-19 Thread joshuata
I have found the easiest way to set up a development platform is to use the databricks sbt-spark-package plugin (assuming you are using scala+sbt). You simply add the plugin to your /project/plugins.sbt file and add the sparkVersion to your

Execute function once on each node

2016-07-18 Thread joshuata
I am working on a spark application that requires the ability to run a function on each node in the cluster. This is used to read data from a directory that is not globally accessible to the cluster. I have tried creating an RDD with n elements and n partitions so that it is evenly distributed