Re: do we really need scala still

Ted Dunning Thu, 29 May 2014 14:32:49 -0700

Pat 

I would like to see the co and cross occurrence code separated out a bit so 
that they take drm args.


Sent from my iPhone

> On May 29, 2014, at 17:58, Pat Ferrel <[email protected]> wrote:
> 
> Regarding recommenders, drivers, and import/export:
> 
> I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads text 
> delimited files into a drm for use with cooccurrence. Then it writes the 
> indicator matrix(es) as text delimited files with user specified IDs. It also 
> has a proposed Driver base class, Scala based option parser and 
> ReadStore/WriteStore traits. The CLI will be mostly a superset of the 
> itemsimilarity in legacy mr. The read/write stuff is meant to be pretty 
> generic so I was planning to do a DB and maybe JSON example (some day). There 
> is still a bit of functional programming refactoring and the docs are not up 
> to date.
> 
> With cooccurrence working we could do something that replaces all the 
> cooccurrence  recommenders (in-memory and MR) with one codebase. Add Solr and 
> you have a single machine server based recommender that we can supply with an 
> API similar to the legacy in-memory recommender. The cool thing is that It 
> will scale out to a cluster with Solr and HDFS, requiring only config 
> changes. The downside is that it requires at least a standalone local version 
> of Spark to do the cooccurrence. BTW this would give us something people have 
> been asking for—a recommender service.
> 
> Is anyone else interested in CLI, drivers, read/write in the import/export 
> sense? Or a new architecture for the recommenders? If so, maybe a separate 
> thread?
> 
> On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote:
> 
> Andrew,
> 
> Sebastian and I were talking yesterday and guessing that you would be
> interested in this soon.  Glad to know the world is as expected.
> 
> Yes. This needs to happen at least at a very conceptual level.  For
> instance, for classifiers, I think that we need to have something like:
> 
>   - progressively train against a batch of data
>        questions: should this do multiple epochs?  Throw an exception if
> on-line training not supported?  throw an exception if too little data
> provided?
> 
>   - classify a batch of data
> 
>   - serialize a model
> 
>   - de-serialize a model
> 
> Note that a batch listed above should be either a bunch of observations or
> just one.
> 
> Question: does this handle the following cases:
> 
> - naive bayes
> - SGD trained on continuous data
> - batch trained <mumble> classifiers
> - downpour type classifier training
> 
> ?
> 
> 
> 
>> On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> wrote:
>> 
>> This may be somewhat tangential to this thread, but would now be a good
>> time to start laying out some scala traits for
>> Classifiers/Clusterers/Recommenders?  I am totally scala-naive, but have
>> been trying to keep up with the discussions.
>> 
>> I don't know if this is premature but it seems that now that the DSL data
>> structures have been at least sketched out if not fully implemented,  it
>> would be useful to have these in place before people start porting too much
>> over.  It might be helpful in bringing in new contributions as well.
>> 
>> It could also help regarding people's questions of integrating a future
>> wrapper layer.
>> 
>> 
>> 
>>> From: [email protected]
>>> Date: Wed, 28 May 2014 17:10:43 -0700
>>> Subject: Re: do we really need scala still
>>> To: [email protected]
>>> 
>>> +1
>>> 
>>> Let's use a successful scala model as a suggestion about where to go.  It
>>> seems plausible that Java could emulate the building of a lazy DSL
>> logical
>>> plan and then poke it in plausible ways with the addition of a wrapper
>>> layer.  But that only helps if the Scala layer succeeds.
>>> 
>>> 
>>> 
>>> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <[email protected]
>>> wrote:
>>> 
>>>> Also, i think that this is leaning towards false dilemma fallacy.
>> Scala and
>>>> java models could happily exist at the same time and hopefully, minimal
>>>> fragmentation of the project if done with precision and care.
>>>> 
>>>> 
>>>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <[email protected]
>>>>> wrote:
>>>> 
>>>>> 
>>>>> not sure there's much sense in taking user survey if we can't act on
>>>> this.
>>>>> In our situation, unfortunately, we don't have that many ideas to
>> choose
>>>>> from, so there's not much wiggle room imo. It is more like
>> reinforcement
>>>>> learning -- stuff that doesn't get used or supported, just dies
>> .that's
>>>> it.
>>>>> Scala bindings, though thumb up'd internally, are yet to earn this
>> status
>>>>> externally. In that sense we always have been watching for
>> use/support,
>>>>> that's why we culled out tons of stuff. Nothing changes going
>> forward (at
>>>>> least at this point). If we have tons of new ideas/contributions,
>> then it
>>>>> may be different. What is weak, dies on its own pretty evidently
>> without
>>>>> much extra effort.
>>>>> 
>>>>> 
>>>>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell
>> to
>>>>>> learn Scala. While I still think it’s the right idea, user’s may
>>>> disagree.
>>>>>> We should probably either solicit comments or at least keep an eye
>> on
>>>>>> reactions to this. Spark took this route when the question was even
>>>> more in
>>>>>> doubt and so is at least partially supporting multiple bindings.
>>>>>> 
>>>>>> Not sure how far we want to carry this but we could supply Java
>> bindings
>>>>>> to the CLI-type things pretty easily.
>>>>>> 
>>>>>> 
>>>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Well, first, functional programming in java8 is about 2-3 years
>> late to
>>>>>> the
>>>>>> scene. So the reasoning along the lines, hey, we already are using
>> tool
>>>> A,
>>>>>> and now tool B is available which is almost as good as A, so let's
>>>> migrate
>>>>>> to B, is fallible. Tool B must demonstrate not just matching
>>>> capabilities,
>>>>>> but far superb, to justify cost of such migration.
>>>>>> 
>>>>>> Second, as other pointed, java 8 doesn't really match scala, not yet
>>>>>> anyway. One important feature of scala bindings work is proper
>> operator
>>>>>> overload (R-like DSL). That would not be possible to do in java 8,
>> as it
>>>>>> stands. Yes, as other pointed, it makes things concise, but most
>>>>>> importantly, it also makes things operation-centric and eliminates
>>>> nested
>>>>>> calls pile-up.
>>>>>> 
>>>>>> Third, as it stands today, it would also presentn a problem from the
>>>> Spark
>>>>>> integration point of view. Spark does have java bindings, but first,
>>>> they
>>>>>> are underdefined (you can check spark list for tons of postings
>> about
>>>>>> missing equivalent capability), and they are certainly not
>>>> java-8-vetted.
>>>>>> So java api in Spark for java 8 purposes, as it stands, is a moot
>> point.
>>>>>> 
>>>>>> There are also a number other goodies and clashes that exist -- use
>> of
>>>>>> scala collections vs. Java collections, clean functional type
>> syntax,
>>>>>> magic
>>>>>> methods, partially defined functions, case class matchers,
>> implicits,
>>>> view
>>>>>> and context bounds etc. Etc., all that sh$tload of acrobatics that
>> comes
>>>>>> actually very handy in existing  implemetations and has no
>> substitute in
>>>>>> Java 8.
>>>>>> On May 25, 2014 12:48 PM, "bandi shankar" <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I was just thinking , do we still need scala . Since in java 8 we
>> have
>>>>>>> all(probably) kind of feature provided by scala.
>>>>>>> Since I am new to group , so just thinking why not to make mahout
>> away
>>>>>>> from scala. Is there any specific reason to adopt scala.
>>>>>>> 
>>>>>>> Bandi
>

Re: do we really need scala still

Reply via email to