Re: Beyond Spark 1.1.1

Andrew Musselman Tue, 19 May 2015 11:25:44 -0700

Might not be terrible, I didn't look too hard but there are 97 instances of
"com.google.common" in mahout-math and 4 in mahout-hdfs.


On Tue, May 19, 2015 at 11:17 AM, Dmitriy Lyubimov <[email protected]>
wrote:

> PS assuming we clean mahout-math and scala modules -- this should be fairly
> easy. Maybe there's some stuff in the colt classes but there shoulnd't be a
> lot?
>
>
> On Tue, May 19, 2015 at 11:16 AM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > can't we just declare its own guava for mahout-mr? Or inherit it from
> > whenever it is declared in hadoop we depend on there?
> >
> > On Tue, May 19, 2015 at 9:24 AM, Pat Ferrel <[email protected]>
> wrote:
> >
> >> I was hoping someone knew the differences. Andrew and I are feeling our
> >> way along since we haven’t used either to any extent.
> >>
> >> On May 19, 2015, at 9:17 AM, Suneel Marthi <[email protected]> wrote:
> >>
> >> Ok, see ur point if its only for MAhout-Math and Mahout-hdfs.  Not sure
> if
> >> its just straight replacement of Preconditions -> Asserts though.
> >> Preconditions throw an exception if some condition is not satisfied.
> Java
> >> Asserts are never meant to be used in production code.
> >>
> >> So the right fix would be to replace all references to Preconditions
> with
> >> some exception handling boilerplate.
> >>
> >> On Tue, May 19, 2015 at 11:58 AM, Pat Ferrel <[email protected]>
> >> wrote:
> >>
> >> > We only have to worry about mahout-math and mahout-hdfs.
> >> >
> >> > Yes, Andrew was working on those they were replaced with plain Java
> >> > asserts.
> >> >
> >> > There still remain the uses you mention in those two modules but I see
> >> no
> >> > good alternative to hacking them out. Maybe we can move some code out
> to
> >> > mahout-mr if it’s easier.
> >> >
> >> > On May 19, 2015, at 8:48 AM, Suneel Marthi <[email protected]>
> wrote:
> >> >
> >> > I had tried minimizing the Guava Dependency to a large extent in the
> >> run up
> >> > to 0.10.0.  Its not as trivial as it seems, there are parts of the
> code
> >> > (Collocations, lucene2seq. Lucene TokenStream processing and
> >> tokenization
> >> > code) that are heavily reliant on AbstractIterator;  and there are
> >> sections
> >> > of the code that assign a HashSet to a List (again have to use Guava
> for
> >> > that if one wants to avoid writing boilerplate for doing the same.
> >> >
> >> > Moreover, things that return something like Iterable<?> and need to be
> >> > converted into a regular collection, can easily be done using Guava
> >> without
> >> > writing own boilerplate again.
> >> >
> >> > Are we replacing all Preconditions by straight Asserts now ??
> >> >
> >> >
> >> > On Tue, May 19, 2015 at 11:21 AM, Pat Ferrel <[email protected]>
> >> > wrote:
> >> >
> >> >> We need to move to Spark 1.3 asap and set the stage for beyond 1.3.
> The
> >> >> primary reason is that the big distros are there already or will be
> >> very
> >> >> soon. Many people using Mahout will have the environment they must
> use
> >> >> dictated by support orgs in their companies so our current position
> as
> >> >> running only on Spark 1.1.1 means many potential users are out of
> luck.
> >> >>
> >> >> Here are the problems I know of in moving Mahout ahead on Spark
> >> >> 1) Guava in any backend code (executor closures) relies on being
> >> >> serialized with Javaserializer, which is broken and hasn’t been fixed
> >> in
> >> >> 1.2+ There is a work around, which involves moving a Guava jar to all
> >> > Spark
> >> >> workers, which is unacceptable in many cases. Guava in the Spark-1.2
> PR
> >> > has
> >> >> been removed from Scala code and will be pushed to the master
> probably
> >> > this
> >> >> week. That leaves a bunch of uses of Guava in java math and hdfs.
> >> Andrew
> >> >> has (I think) removed the Preconditions and replaced them with
> asserts.
> >> > But
> >> >> there remain some uses of Map and AbstractIterator from Guava. Not
> sure
> >> > how
> >> >> many of these remain but if anyone can help please check here:
> >> >> https://issues.apache.org/jira/browse/MAHOUT-1708 <
> >> >> https://issues.apache.org/jira/browse/MAHOUT-1708>
> >> >> 2) the Mahout Shell relies on APIs not available in Spark 1.3.
> >> >> 3) the api for writing to sequence files now requires implicit values
> >> > that
> >> >> are not available in the current code. I think Andy did a temp fix to
> >> > write
> >> >> to object files but this is probably nto what we want to release.
> >> >>
> >> >> I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+.
> >> and
> >> >> soon. This is a call for help in cleaning these things up. Even with
> no
> >> > new
> >> >> features the above things would make Mahout much more usable in
> current
> >> >> environments.
> >> >
> >> >
> >>
> >>
> >
>

Re: Beyond Spark 1.1.1

Reply via email to