Re: Beyond Spark 1.1.1

Pat Ferrel Wed, 20 May 2015 08:24:12 -0700

> These rules do not apply to the java modules, of course.

So you are correct but we do use the Scala construct _in Scala_


On May 20, 2015, at 8:20 AM, Suneel Marthi <[email protected]> wrote:

Ok, I was talking about Java asserts. Fine then go with it.

On Wed, May 20, 2015 at 11:18 AM, Pat Ferrel <[email protected]> wrote:

> BTW Scala assert, require, etc. are quite a different thing than Java
> assert. They do not use the java framework and _are_ indeed useful in
> production code for many of the reasons Preconditions were used. Scala
> provides several methods to check invariants and API contracts. They throw
> different exceptions and _can_ be disabled at runtime though this is
> controversial. They are peppered throughout the DSL code and afaik are not
> meant to be disabled at runtime. Think of them as a replacement for
> Preconditions.
> 
> These rules do not apply to the java modules, of course.
> 
> On May 19, 2015, at 9:17 AM, Suneel Marthi <[email protected]> wrote:
> 
> Ok, see ur point if its only for MAhout-Math and Mahout-hdfs.  Not sure if
> its just straight replacement of Preconditions -> Asserts though.
> Preconditions throw an exception if some condition is not satisfied. Java
> Asserts are never meant to be used in production code.
> 
> So the right fix would be to replace all references to Preconditions with
> some exception handling boilerplate.
> 
> On Tue, May 19, 2015 at 11:58 AM, Pat Ferrel <[email protected]>
> wrote:
> 
>> We only have to worry about mahout-math and mahout-hdfs.
>> 
>> Yes, Andrew was working on those they were replaced with plain Java
>> asserts.
>> 
>> There still remain the uses you mention in those two modules but I see no
>> good alternative to hacking them out. Maybe we can move some code out to
>> mahout-mr if it’s easier.
>> 
>> On May 19, 2015, at 8:48 AM, Suneel Marthi <[email protected]> wrote:
>> 
>> I had tried minimizing the Guava Dependency to a large extent in the run
> up
>> to 0.10.0.  Its not as trivial as it seems, there are parts of the code
>> (Collocations, lucene2seq. Lucene TokenStream processing and tokenization
>> code) that are heavily reliant on AbstractIterator;  and there are
> sections
>> of the code that assign a HashSet to a List (again have to use Guava for
>> that if one wants to avoid writing boilerplate for doing the same.
>> 
>> Moreover, things that return something like Iterable<?> and need to be
>> converted into a regular collection, can easily be done using Guava
> without
>> writing own boilerplate again.
>> 
>> Are we replacing all Preconditions by straight Asserts now ??
>> 
>> 
>> On Tue, May 19, 2015 at 11:21 AM, Pat Ferrel <[email protected]>
>> wrote:
>> 
>>> We need to move to Spark 1.3 asap and set the stage for beyond 1.3. The
>>> primary reason is that the big distros are there already or will be very
>>> soon. Many people using Mahout will have the environment they must use
>>> dictated by support orgs in their companies so our current position as
>>> running only on Spark 1.1.1 means many potential users are out of luck.
>>> 
>>> Here are the problems I know of in moving Mahout ahead on Spark
>>> 1) Guava in any backend code (executor closures) relies on being
>>> serialized with Javaserializer, which is broken and hasn’t been fixed in
>>> 1.2+ There is a work around, which involves moving a Guava jar to all
>> Spark
>>> workers, which is unacceptable in many cases. Guava in the Spark-1.2 PR
>> has
>>> been removed from Scala code and will be pushed to the master probably
>> this
>>> week. That leaves a bunch of uses of Guava in java math and hdfs. Andrew
>>> has (I think) removed the Preconditions and replaced them with asserts.
>> But
>>> there remain some uses of Map and AbstractIterator from Guava. Not sure
>> how
>>> many of these remain but if anyone can help please check here:
>>> https://issues.apache.org/jira/browse/MAHOUT-1708 <
>>> https://issues.apache.org/jira/browse/MAHOUT-1708>
>>> 2) the Mahout Shell relies on APIs not available in Spark 1.3.
>>> 3) the api for writing to sequence files now requires implicit values
>> that
>>> are not available in the current code. I think Andy did a temp fix to
>> write
>>> to object files but this is probably nto what we want to release.
>>> 
>>> I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+. and
>>> soon. This is a call for help in cleaning these things up. Even with no
>> new
>>> features the above things would make Mahout much more usable in current
>>> environments.
>> 
>> 
> 
>

Re: Beyond Spark 1.1.1

Reply via email to