Might not be terrible, I didn't look too hard but there are 97 instances of "com.google.common" in mahout-math and 4 in mahout-hdfs.
On Tue, May 19, 2015 at 11:17 AM, Dmitriy Lyubimov <[email protected]> wrote: > PS assuming we clean mahout-math and scala modules -- this should be fairly > easy. Maybe there's some stuff in the colt classes but there shoulnd't be a > lot? > > > On Tue, May 19, 2015 at 11:16 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > can't we just declare its own guava for mahout-mr? Or inherit it from > > whenever it is declared in hadoop we depend on there? > > > > On Tue, May 19, 2015 at 9:24 AM, Pat Ferrel <[email protected]> > wrote: > > > >> I was hoping someone knew the differences. Andrew and I are feeling our > >> way along since we haven’t used either to any extent. > >> > >> On May 19, 2015, at 9:17 AM, Suneel Marthi <[email protected]> wrote: > >> > >> Ok, see ur point if its only for MAhout-Math and Mahout-hdfs. Not sure > if > >> its just straight replacement of Preconditions -> Asserts though. > >> Preconditions throw an exception if some condition is not satisfied. > Java > >> Asserts are never meant to be used in production code. > >> > >> So the right fix would be to replace all references to Preconditions > with > >> some exception handling boilerplate. > >> > >> On Tue, May 19, 2015 at 11:58 AM, Pat Ferrel <[email protected]> > >> wrote: > >> > >> > We only have to worry about mahout-math and mahout-hdfs. > >> > > >> > Yes, Andrew was working on those they were replaced with plain Java > >> > asserts. > >> > > >> > There still remain the uses you mention in those two modules but I see > >> no > >> > good alternative to hacking them out. Maybe we can move some code out > to > >> > mahout-mr if it’s easier. > >> > > >> > On May 19, 2015, at 8:48 AM, Suneel Marthi <[email protected]> > wrote: > >> > > >> > I had tried minimizing the Guava Dependency to a large extent in the > >> run up > >> > to 0.10.0. Its not as trivial as it seems, there are parts of the > code > >> > (Collocations, lucene2seq. Lucene TokenStream processing and > >> tokenization > >> > code) that are heavily reliant on AbstractIterator; and there are > >> sections > >> > of the code that assign a HashSet to a List (again have to use Guava > for > >> > that if one wants to avoid writing boilerplate for doing the same. > >> > > >> > Moreover, things that return something like Iterable<?> and need to be > >> > converted into a regular collection, can easily be done using Guava > >> without > >> > writing own boilerplate again. > >> > > >> > Are we replacing all Preconditions by straight Asserts now ?? > >> > > >> > > >> > On Tue, May 19, 2015 at 11:21 AM, Pat Ferrel <[email protected]> > >> > wrote: > >> > > >> >> We need to move to Spark 1.3 asap and set the stage for beyond 1.3. > The > >> >> primary reason is that the big distros are there already or will be > >> very > >> >> soon. Many people using Mahout will have the environment they must > use > >> >> dictated by support orgs in their companies so our current position > as > >> >> running only on Spark 1.1.1 means many potential users are out of > luck. > >> >> > >> >> Here are the problems I know of in moving Mahout ahead on Spark > >> >> 1) Guava in any backend code (executor closures) relies on being > >> >> serialized with Javaserializer, which is broken and hasn’t been fixed > >> in > >> >> 1.2+ There is a work around, which involves moving a Guava jar to all > >> > Spark > >> >> workers, which is unacceptable in many cases. Guava in the Spark-1.2 > PR > >> > has > >> >> been removed from Scala code and will be pushed to the master > probably > >> > this > >> >> week. That leaves a bunch of uses of Guava in java math and hdfs. > >> Andrew > >> >> has (I think) removed the Preconditions and replaced them with > asserts. > >> > But > >> >> there remain some uses of Map and AbstractIterator from Guava. Not > sure > >> > how > >> >> many of these remain but if anyone can help please check here: > >> >> https://issues.apache.org/jira/browse/MAHOUT-1708 < > >> >> https://issues.apache.org/jira/browse/MAHOUT-1708> > >> >> 2) the Mahout Shell relies on APIs not available in Spark 1.3. > >> >> 3) the api for writing to sequence files now requires implicit values > >> > that > >> >> are not available in the current code. I think Andy did a temp fix to > >> > write > >> >> to object files but this is probably nto what we want to release. > >> >> > >> >> I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+. > >> and > >> >> soon. This is a call for help in cleaning these things up. Even with > no > >> > new > >> >> features the above things would make Mahout much more usable in > current > >> >> environments. > >> > > >> > > >> > >> > > >
