We need to move to Spark 1.3 asap and set the stage for beyond 1.3. The primary reason is that the big distros are there already or will be very soon. Many people using Mahout will have the environment they must use dictated by support orgs in their companies so our current position as running only on Spark 1.1.1 means many potential users are out of luck.
Here are the problems I know of in moving Mahout ahead on Spark 1) Guava in any backend code (executor closures) relies on being serialized with Javaserializer, which is broken and hasn’t been fixed in 1.2+ There is a work around, which involves moving a Guava jar to all Spark workers, which is unacceptable in many cases. Guava in the Spark-1.2 PR has been removed from Scala code and will be pushed to the master probably this week. That leaves a bunch of uses of Guava in java math and hdfs. Andrew has (I think) removed the Preconditions and replaced them with asserts. But there remain some uses of Map and AbstractIterator from Guava. Not sure how many of these remain but if anyone can help please check here: https://issues.apache.org/jira/browse/MAHOUT-1708 <https://issues.apache.org/jira/browse/MAHOUT-1708> 2) the Mahout Shell relies on APIs not available in Spark 1.3. 3) the api for writing to sequence files now requires implicit values that are not available in the current code. I think Andy did a temp fix to write to object files but this is probably nto what we want to release. I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+. and soon. This is a call for help in cleaning these things up. Even with no new features the above things would make Mahout much more usable in current environments.
