Beyond Spark 1.1.1

Pat Ferrel Tue, 19 May 2015 08:22:12 -0700

We need to move to Spark 1.3 asap and set the stage for beyond 1.3. The primary 
reason is that the big distros are there already or will be very soon. Many 
people using Mahout will have the environment they must use dictated by support 
orgs in their companies so our current position as running only on Spark 1.1.1 
means many potential users are out of luck.


Here are the problems I know of in moving Mahout ahead on Spark
1) Guava in any backend code (executor closures) relies on being serialized 
with Javaserializer, which is broken and hasn’t been fixed in 1.2+ There is a 
work around, which involves moving a Guava jar to all Spark workers, which is 
unacceptable in many cases. Guava in the Spark-1.2 PR has been removed from 
Scala code and will be pushed to the master probably this week. That leaves a 
bunch of uses of Guava in java math and hdfs. Andrew has (I think) removed the 
Preconditions and replaced them with asserts. But there remain some uses of Map 
and AbstractIterator from Guava. Not sure how many of these remain but if 
anyone can help please check here: 
https://issues.apache.org/jira/browse/MAHOUT-1708 
<https://issues.apache.org/jira/browse/MAHOUT-1708>
2) the Mahout Shell relies on APIs not available in Spark 1.3.
3) the api for writing to sequence files now requires implicit values that are 
not available in the current code. I think Andy did a temp fix to write to 
object files but this is probably nto what we want to release.

I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+. and soon. 
This is a call for help in cleaning these things up. Even with no new features 
the above things would make Mahout much more usable in current environments.

Beyond Spark 1.1.1

Reply via email to