Re: Beyond Spark 1.1.1

Andrew Musselman Tue, 19 May 2015 09:08:04 -0700

I only looked at replacing Precinditions by asserts and found a bunch of
other stuff from Google common package, so held off.


On Tuesday, May 19, 2015, Suneel Marthi <[email protected]> wrote:

> I had tried minimizing the Guava Dependency to a large extent in the run up
> to 0.10.0.  Its not as trivial as it seems, there are parts of the code
> (Collocations, lucene2seq. Lucene TokenStream processing and tokenization
> code) that are heavily reliant on AbstractIterator;  and there are sections
> of the code that assign a HashSet to a List (again have to use Guava for
> that if one wants to avoid writing boilerplate for doing the same.
>
> Moreover, things that return something like Iterable<?> and need to be
> converted into a regular collection, can easily be done using Guava without
> writing own boilerplate again.
>
> Are we replacing all Preconditions by straight Asserts now ??
>
>
> On Tue, May 19, 2015 at 11:21 AM, Pat Ferrel <[email protected]
> <javascript:;>> wrote:
>
> > We need to move to Spark 1.3 asap and set the stage for beyond 1.3. The
> > primary reason is that the big distros are there already or will be very
> > soon. Many people using Mahout will have the environment they must use
> > dictated by support orgs in their companies so our current position as
> > running only on Spark 1.1.1 means many potential users are out of luck.
> >
> > Here are the problems I know of in moving Mahout ahead on Spark
> > 1) Guava in any backend code (executor closures) relies on being
> > serialized with Javaserializer, which is broken and hasn’t been fixed in
> > 1.2+ There is a work around, which involves moving a Guava jar to all
> Spark
> > workers, which is unacceptable in many cases. Guava in the Spark-1.2 PR
> has
> > been removed from Scala code and will be pushed to the master probably
> this
> > week. That leaves a bunch of uses of Guava in java math and hdfs. Andrew
> > has (I think) removed the Preconditions and replaced them with asserts.
> But
> > there remain some uses of Map and AbstractIterator from Guava. Not sure
> how
> > many of these remain but if anyone can help please check here:
> > https://issues.apache.org/jira/browse/MAHOUT-1708 <
> > https://issues.apache.org/jira/browse/MAHOUT-1708>
> > 2) the Mahout Shell relies on APIs not available in Spark 1.3.
> > 3) the api for writing to sequence files now requires implicit values
> that
> > are not available in the current code. I think Andy did a temp fix to
> write
> > to object files but this is probably nto what we want to release.
> >
> > I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+. and
> > soon. This is a call for help in cleaning these things up. Even with no
> new
> > features the above things would make Mahout much more usable in current
> > environments.
>

Re: Beyond Spark 1.1.1

Reply via email to