Itemsimilairty

Pat Ferrel Thu, 29 May 2014 15:35:54 -0700

Agreed and in process. Sebastian’s Cooccurrence code optionally takes two drms.


The current CLI for itemsimilarity filters one stream for input, optionally 
creating two DRMs and so does support cross-similarity. The CLI will soon allow 
 two input streams. The CLI for RSJ will (if I do it) take one or two DRMs.

Please feel free to comment on the Jiras MAHOUT-1464 (cooccurrence) and 
MAHOUT-1541 (itemsimilarity CLI)

They are maybe 80% ready, which is why a dialog over file reader/writers, 
drivers, and CLI might be good. If we can move on those there are a bunch of 
other jobs that can be packaged up pretty quickly from Dmitriy’s SSVD PCA, 
Transpose, multiply, etc.

On May 29, 2014, at 2:32 PM, Ted Dunning <[email protected]> wrote:

Pat 

I would like to see the co and cross occurrence code separated out a bit so 
that they take drm args.  

Sent from my iPhone

> On May 29, 2014, at 17:58, Pat Ferrel <[email protected]> wrote:
> 
> Regarding recommenders, drivers, and import/export:
> 
> I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads text 
> delimited files into a drm for use with cooccurrence. Then it writes the 
> indicator matrix(es) as text delimited files with user specified IDs. It also 
> has a proposed Driver base class, Scala based option parser and 
> ReadStore/WriteStore traits. The CLI will be mostly a superset of the 
> itemsimilarity in legacy mr. The read/write stuff is meant to be pretty 
> generic so I was planning to do a DB and maybe JSON example (some day). There 
> is still a bit of functional programming refactoring and the docs are not up 
> to date.
> 
> With cooccurrence working we could do something that replaces all the 
> cooccurrence  recommenders (in-memory and MR) with one codebase. Add Solr and 
> you have a single machine server based recommender that we can supply with an 
> API similar to the legacy in-memory recommender. The cool thing is that It 
> will scale out to a cluster with Solr and HDFS, requiring only config 
> changes. The downside is that it requires at least a standalone local version 
> of Spark to do the cooccurrence. BTW this would give us something people have 
> been asking for—a recommender service.
> 
> Is anyone else interested in CLI, drivers, read/write in the import/export 
> sense? Or a new architecture for the recommenders? If so, maybe a separate 
> thread?
> 
> On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote:
> 
> Andrew,
> 
> Sebastian and I were talking yesterday and guessing that you would be
> interested in this soon.  Glad to know the world is as expected.
> 
> Yes. This needs to happen at least at a very conceptual level.  For
> instance, for classifiers, I think that we need to have something like:
> 
>  - progressively train against a batch of data
>       questions: should this do multiple epochs?  Throw an exception if
> on-line training not supported?  throw an exception if too little data
> provided?
> 
>  - classify a batch of data
> 
>  - serialize a model
> 
>  - de-serialize a model
> 
> Note that a batch listed above should be either a bunch of observations or
> just one.
> 
> Question: does this handle the following cases:
> 
> - naive bayes
> - SGD trained on continuous data
> - batch trained <mumble> classifiers
> - downpour type classifier training
> 
> ?
> 
> 
> 
>> On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> wrote:
>> 
>> This may be somewhat tangential to this thread, but would now be a good
>> time to start laying out some scala traits for
>> Classifiers/Clusterers/Recommenders?  I am totally scala-naive, but have
>> been trying to keep up with the discussions.
>> 
>> I don't know if this is premature but it seems that now that the DSL data
>> structures have been at least sketched out if not fully implemented,  it
>> would be useful to have these in place before people start porting too much
>> over.  It might be helpful in bringing in new contributions as well.
>> 
>> It could also help regarding people's questions of integrating a future
>> wrapper layer.
>> 
>> 
>> 
>>> From: [email protected]
>>> Date: Wed, 28 May 2014 17:10:43 -0700
>>> Subject: Re: do we really need scala still
>>> To: [email protected]
>>> 
>>> +1
>>> 
>>> Let's use a successful scala model as a suggestion about where to go.  It
>>> seems plausible that Java could emulate the building of a lazy DSL
>> logical
>>> plan and then poke it in plausible ways with the addition of a wrapper
>>> layer.  But that only helps if the Scala layer succeeds.
>>> 
>>> 
>>> 
>>> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <[email protected]
>>> wrote:
>>> 
>>>> Also, i think that this is leaning towards false dilemma fallacy.
>> Scala and
>>>> java models could happily exist at the same time and hopefully, minimal
>>>> fragmentation of the project if done with precision and care.
>>>> 
>>>> 
>>>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <[email protected]
>>>>> wrote:
>>>> 
>>>>> 
>>>>> not sure there's much sense in taking user survey if we can't act on
>>>> this.
>>>>> In our situation, unfortunately, we don't have that many ideas to
>> choose
>>>>> from, so there's not much wiggle room imo. It is more like
>> reinforcement
>>>>> learning -- stuff that doesn't get used or supported, just dies
>> .that's
>>>> it.
>>>>> Scala bindings, though thumb up'd internally, are yet to earn this
>> status
>>>>> externally. In that sense we always have been watching for
>> use/support,
>>>>> that's why we culled out tons of stuff. Nothing changes going
>> forward (at
>>>>> least at this point). If we have tons of new ideas/contributions,
>> then it
>>>>> may be different. What is weak, dies on its own pretty evidently
>> without
>>>>> much extra effort.
>>>>> 
>>>>> 
>>>>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell
>> to
>>>>>> learn Scala. While I still think it’s the right idea, user’s may
>>>> disagree.
>>>>>> We should probably either solicit comments or at least keep an eye
>> on
>>>>>> reactions to this. Spark took this route when the question was even
>>>> more in
>>>>>> doubt and so is at least partially supporting multiple bindings.
>>>>>> 
>>>>>> Not sure how far we want to carry this but we could supply Java
>> bindings
>>>>>> to the CLI-type things pretty easily.
>>>>>> 
>>>>>> 
>>>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Well, first, functional programming in java8 is about 2-3 years
>> late to
>>>>>> the
>>>>>> scene. So the reasoning along the lines, hey, we already are using
>> tool
>>>> A,
>>>>>> and now tool B is available which is almost as good as A, so let's
>>>> migrate
>>>>>> to B, is fallible. Tool B must demonstrate not just matching
>>>> capabilities,
>>>>>> but far superb, to justify cost of such migration.
>>>>>> 
>>>>>> Second, as other pointed, java 8 doesn't really match scala, not yet
>>>>>> anyway. One important feature of scala bindings work is proper
>> operator
>>>>>> overload (R-like DSL). That would not be possible to do in java 8,
>> as it
>>>>>> stands. Yes, as other pointed, it makes things concise, but most
>>>>>> importantly, it also makes things operation-centric and eliminates
>>>> nested
>>>>>> calls pile-up.
>>>>>> 
>>>>>> Third, as it stands today, it would also presentn a problem from the
>>>> Spark
>>>>>> integration point of view. Spark does have java bindings, but first,
>>>> they
>>>>>> are underdefined (you can check spark list for tons of postings
>> about
>>>>>> missing equivalent capability), and they are certainly not
>>>> java-8-vetted.
>>>>>> So java api in Spark for java 8 purposes, as it stands, is a moot
>> point.
>>>>>> 
>>>>>> There are also a number other goodies and clashes that exist -- use
>> of
>>>>>> scala collections vs. Java collections, clean functional type
>> syntax,
>>>>>> magic
>>>>>> methods, partially defined functions, case class matchers,
>> implicits,
>>>> view
>>>>>> and context bounds etc. Etc., all that sh$tload of acrobatics that
>> comes
>>>>>> actually very handy in existing  implemetations and has no
>> substitute in
>>>>>> Java 8.
>>>>>> On May 25, 2014 12:48 PM, "bandi shankar" <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I was just thinking , do we still need scala . Since in java 8 we
>> have
>>>>>>> all(probably) kind of feature provided by scala.
>>>>>>> Since I am new to group , so just thinking why not to make mahout
>> away
>>>>>>> from scala. Is there any specific reason to adopt scala.
>>>>>>> 
>>>>>>> Bandi
>

Itemsimilairty

Reply via email to