Just testing out the rc1. I create a dependent project (using maven) and I copied the HdfsTest.scala test, but I added a single line to save the file back to disk:
package org.apache.spark.examples import org.apache.spark._ object HdfsTest { def main(args: Array[String]) { val sc = new SparkContext(args(0), "HdfsTest", System.getenv("SPARK_HOME"), SparkContext.jarOfClass(this.getClass)) val file = sc.textFile(args(1)) val mapped = file.map(s => s.length).cache() for (iter <- 1 to 10) { val start = System.currentTimeMillis() for (x <- mapped) { x + 2 } // println("Processing: " + x) val end = System.currentTimeMillis() println("Iteration " + iter + " took " + (end-start) + " ms") mapped.saveAsTextFile("out") } System.exit(0) } } and this my pom file: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>my.examples</groupId> <artifactId>spark-samples</artifactId> <version>0.0.1-SNAPSHOT</version> <inceptionYear>2014</inceptionYear> <properties> <maven.compiler.source>1.6</maven.compiler.source> <maven.compiler.target>1.6</maven.compiler.target> <encoding>UTF-8</encoding> <scala.tools.version>2.10</scala.tools.version> <scala.version>2.10.0</scala.version> </properties> <repositories> <repository> <id>spark staging</id> <url>https://repository.apache.org/content/repositories/orgapachespark-1001</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.tools.version}</artifactId> <version>0.9.0-incubating</version> </dependency> <!-- Test --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs2</groupId> <artifactId>specs2_${scala.tools.version}</artifactId> <version>1.13</version> <scope>test</scope> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_${scala.tools.version}</artifactId> <version>2.0.M6-SNAP8</version> <scope>test</scope> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin> <!-- see http://davidb.github.com/scala-maven-plugin --> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.1.6</version> <configuration> <scalaCompatVersion>2.10</scalaCompatVersion> <jvmArgs> <jvmArg>-Xms128m</jvmArg> <jvmArg>-Xmx2048m</jvmArg> </jvmArgs> </configuration> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args> <arg>-make:transitive</arg> <arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.13</version> <configuration> <useFile>false</useFile> <disableXmlReport>true</disableXmlReport> <!-- If you have classpath issue like NoDefClassError,... --> <!-- useManifestOnlyJar>false</useManifestOnlyJar --> <includes> <include>**/*Test.*</include> <include>**/*Suite.*</include> </includes> </configuration> </plugin> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>1.2.1</version> <executions> <execution> <goals> <goal>exec</goal> </goals> </execution> </executions> <configuration> <mainClass>org.apache.spark.examples.HdfsTest</mainClass> <arguments> <argument>local</argument> <argument>pom.xml</argument> </arguments> </configuration> </plugin> </plugins> </build> </project> now, when I run it either in eclipse or using "mvn exec:java" I get the following error: [INFO] [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ spark-samples --- SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/acozzi/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/acozzi/.m2/repository/org/slf4j/slf4j-simple/1.6.1/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/01/15 23:37:57 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/01/15 23:37:57 INFO Remoting: Starting remoting 14/01/15 23:37:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.0.1.10:53682] 14/01/15 23:37:57 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.0.1.10:53682] 14/01/15 23:37:57 INFO spark.SparkEnv: Registering BlockManagerMaster 14/01/15 23:37:57 INFO storage.DiskBlockManager: Created local directory at /var/folders/mm/4qxz27w91p96v2zp5f9ncmqm38ychm/T/spark-local-20140115233757-7a41 14/01/15 23:37:57 INFO storage.MemoryStore: MemoryStore started with capacity 1218.8 MB. 14/01/15 23:37:57 INFO network.ConnectionManager: Bound socket to port 53683 with id = ConnectionManagerId(10.0.1.10,53683) 14/01/15 23:37:57 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/01/15 23:37:57 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 10.0.1.10:53683 with 1218.8 MB RAM 14/01/15 23:37:57 INFO storage.BlockManagerMaster: Registered BlockManager 14/01/15 23:37:57 INFO spark.HttpServer: Starting HTTP Server 14/01/15 23:37:57 INFO server.Server: jetty-7.6.8.v20121106 14/01/15 23:37:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:53684 14/01/15 23:37:57 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.0.1.10:53684 14/01/15 23:37:57 INFO spark.SparkEnv: Registering MapOutputTracker 14/01/15 23:37:57 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/mm/4qxz27w91p96v2zp5f9ncmqm38ychm/T/spark-e9304513-3714-430f-aa14-1a430a915d98 14/01/15 23:37:57 INFO spark.HttpServer: Starting HTTP Server 14/01/15 23:37:57 INFO server.Server: jetty-7.6.8.v20121106 14/01/15 23:37:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:53685 14/01/15 23:37:57 INFO server.Server: jetty-7.6.8.v20121106 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null} 14/01/15 23:37:57 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null} 14/01/15 23:37:57 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/01/15 23:37:57 INFO ui.SparkUI: Started Spark Web UI at http://10.0.1.10:4040 2014-01-15 23:37:57.929 java[34819:1020b] Unable to load realm mapping info from SCDynamicStore 14/01/15 23:37:58 INFO storage.MemoryStore: ensureFreeSpace(35456) called with curMem=0, maxMem=1278030643 14/01/15 23:37:58 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 34.6 KB, free 1218.8 MB) 14/01/15 23:37:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/15 23:37:58 WARN snappy.LoadSnappy: Snappy native library not loaded 14/01/15 23:37:58 INFO mapred.FileInputFormat: Total input paths to process : 1 14/01/15 23:37:58 INFO spark.SparkContext: Starting job: foreach at HdfsTest.scala:30 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Got job 0 (foreach at HdfsTest.scala:30) with 1 output partitions (allowLocal=false) 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Final stage: Stage 0 (foreach at HdfsTest.scala:30) 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Missing parents: List() 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[2] at map at HdfsTest.scala:27), which has no missing parents 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[2] at map at HdfsTest.scala:27) 14/01/15 23:37:58 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 14/01/15 23:37:58 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) 14/01/15 23:37:58 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1778 bytes in 5 ms 14/01/15 23:37:58 INFO executor.Executor: Running task ID 0 14/01/15 23:37:58 INFO storage.BlockManager: Found block broadcast_0 locally 14/01/15 23:37:58 INFO spark.CacheManager: Partition rdd_2_0 not found, computing it 14/01/15 23:37:58 INFO rdd.HadoopRDD: Input split: file:/Users/acozzi/Documents/workspace/spark-samples/pom.xml:0+4092 14/01/15 23:37:58 INFO storage.MemoryStore: ensureFreeSpace(2853) called with curMem=35456, maxMem=1278030643 14/01/15 23:37:58 INFO storage.MemoryStore: Block rdd_2_0 stored as values to memory (estimated size 2.8 KB, free 1218.8 MB) 14/01/15 23:37:58 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Added rdd_2_0 in memory on 10.0.1.10:53683 (size: 2.8 KB, free: 1218.8 MB) 14/01/15 23:37:58 INFO storage.BlockManagerMaster: Updated info of block rdd_2_0 14/01/15 23:37:58 INFO executor.Executor: Serialized size of result for 0 is 525 14/01/15 23:37:58 INFO executor.Executor: Sending result for 0 directly to driver 14/01/15 23:37:58 INFO executor.Executor: Finished task ID 0 14/01/15 23:37:58 INFO scheduler.TaskSetManager: Finished TID 0 in 61 ms on localhost (progress: 0/1) 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0) 14/01/15 23:37:58 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 0.0 from pool 14/01/15 23:37:58 INFO scheduler.DAGScheduler: Stage 0 (foreach at HdfsTest.scala:30) finished in 0.071 s 14/01/15 23:37:58 INFO spark.SparkContext: Job finished: foreach at HdfsTest.scala:30, took 0.151199 s Iteration 1 took 189 ms [WARNING] java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.hadoop.mapred.SparkHadoopMapRedUtil$class.firstAvailableClass(SparkHadoopMapRedUtil.scala:48) at org.apache.hadoop.mapred.SparkHadoopMapRedUtil$class.newJobContext(SparkHadoopMapRedUtil.scala:23) at org.apache.hadoop.mapred.SparkHadoopWriter.newJobContext(SparkHadoopWriter.scala:40) at org.apache.hadoop.mapred.SparkHadoopWriter.getJobContext(SparkHadoopWriter.scala:149) at org.apache.hadoop.mapred.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:64) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:713) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:686) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894) at org.apache.spark.examples.HdfsTest$$anonfun$main$1.apply$mcVI$sp(HdfsTest.scala:34) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142) at org.apache.spark.examples.HdfsTest$.main(HdfsTest.scala:28) at org.apache.spark.examples.HdfsTest.main(HdfsTest.scala) ... 6 more [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3.224s [INFO] Finished at: Wed Jan 15 23:37:58 PST 2014 [INFO] Final Memory: 12M/81M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project spark-samples: An exception occured while executing the Java class. null: InvocationTargetException: Implementing class -> [Help 1] Alex Cozzi alexco...@gmail.com On Jan 15, 2014, at 5:48 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Please vote on releasing the following candidate as Apache Spark > (incubating) version 0.9.0. > > A draft of the release notes along with the changes file is attached > to this e-mail. > > The tag to be voted on is v0.9.0-incubating (commit 7348893): > https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=7348893f0edd96dacce2f00970db1976266f7008 > > The release files, including signatures, digests, etc can be found at: > http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1/ > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/pwendell.asc > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1001/ > > The documentation corresponding to this release can be found at: > http://people.apache.org/~pwendell/spark-0.9.0-incubating-rc1-docs/ > > Please vote on releasing this package as Apache Spark 0.9.0-incubating! > > The vote is open until Sunday, January 19, at 02:00 UTC > and passes if a majority of at least 3 +1 PPMC votes are cast. > > [ ] +1 Release this package as Apache Spark 0.9.0-incubating > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see > http://spark.incubator.apache.org/