Re: [xwiki-devs] apache lucene mahout : for advanced xwiki "search" ?

Paul Libbrecht Thu, 07 Jan 2010 14:40:06 -0800

Niels,

do you have successful usages already?
Mahout is promising for sure but it seems quite at its debut.


One thing that seemed almost implementable is "preference matching":  
Mahout taste:
        http://lucene.apache.org/mahout/taste.html
But I did not find the time to use it.

paul


Le 07-janv.-10 à 23:10, Niels Mayer a écrit :

> http://lucene.apache.org/mahout/ <http://lucene.apache.org/mahout/ 
> >Mahout's
> goal is to build scalable machine learning libraries. With scalable  
> we mean:
>
>   -
>
>   Scalable to reasonably large data sets. Our core algorithms for
>   clustering, classfication and batch based collaborative filtering  
> are
>   implemented on top of Apache Hadoop using the map/reduce paradigm.  
> However
>   we do not restrict contributions to Hadoop based implementations:
>   Contributions that run on a single node or on a non-Hadoop cluster  
> are
>   welcome as well. The core libraries are highly optimized to allow  
> for good
>   performance also for non-distributed algorithms.
>
>
> http://www.manning.com/owen/
>
>    Mahout is a machine learning library. The algorithms it  
> implements fall
>> under the broad umbrella of “machine
>
> learning,” or “collective intelligence.” This can mean many things,  
> but at
>> the moment for Mahout it means primarily
>
> recommender engines, clustering, and classification.
>
>    It is scalable. It attempts to provide implementations that use  
> modern
>> frameworks for splitting huge
>
> computations efficiently across many machines. Mahout aims to be the  
> machine
>> learning tool of choice when the
>
> data to be processed is far too big for a single machine. In its  
> current
>> incarnation, these scalable implementations
>
> are written in Java and built upon Apache's Hadoop project.
>
>    It is a Java library. It does not provide a user interface, a
>> pre-packaged server, or installer. It is a framework of
>
> tools intended to be used and adapted by developers. Mahout can be  
> deployed
>> to solve problems if you are
>
> developing modern, intelligent applications or if you are a leading a
>> product team or startup that will leverage
>
> machine learning to create a competitive advantage.
>
>    If you are a researcher in artificial intelligence, machine  
> learning and
>> related areas your biggest obstacle is
>
> probably translating new algorithms into practice. Mahout provides a  
> fertile
>> framework for testing and deploying
>
> new large-scale algorithms.
>
> ...
> some example usage:
> ...
>
>> Recommender Engines
>
> Recommender engines are perhaps the most immediately recognizable  
> machine
>> learning technique in use today.
>
> We've all seen services or sites that attempt to recommend books or  
> movies
>> or articles based on our past actions.
>
> They try to infer tastes and preferences and identify unknown items  
> that are
>> of interest:
>
>         Amazon.com is perhaps the most famous commerce site to deploy
>> recommendations. Based on purchases
>
>    •
>
>         and site activity, Amazon recommends books and other items  
> likely
>> to be of interest. See figure 1.1.
>
>         Netflix similarly recommends DVDs that may be of interest, and
>> famously offered a $1,000,000 prize to
>
>    •
>
>         researchers that could improve the quality of their
>> recommendations.
>
>         Social networking sites like Facebook use variants on  
> recommender
>> techniques to identify people most
>
>    •
>
>         likely to be an as-yet-unconnected friend.
>
>
>
> ....
>
>> Clustering
>
> Clustering turns up in less obvious but equally well-known contexts.  
> As its
>> name implies, clustering techniques
>
> attempt to group a large number of things together into clusters  
> that share
>> some similarity. It is a way to discover
>
> hierarchy and order in a large or hard-to-understand data set, and  
> in that
>> way reveal interesting patterns or make
>
> the data set easier to comprehend.
>
>         Google News groups news articles according to their topic  
> using
>> clustering techniques in order to present
>
>     •
>
>         news grouped by logical story, rather than a raw listing of  
> all
>> articles. Figure 1.2 below illustrates this.
>
>         Search engines like Clusty group search results for similar
>> reasons.
>
>     •
>
> ...
>
>> Classification
>
> Classification techniques decide how much a thing is or isn't part  
> of some
>> type or category, or, does or doesn't
>
> have some attribute. Classification is likewise ubiquitous though  
> even more
>> behind-the-scenes. Often these
>
> systems “learn” by reviewing many instances of items of the  
> categories in
>> question in order to deduce classification
>
> rules. This general idea finds many applications:
>
>          Yahoo! Mail decides whether incoming messages are spam, or  
> not,
>> based on prior emails and spam
>
>     •
>
>          reports from users, as well as characteristics of the e-mail
>> itself. A few messages classified as spam are
>
>          shown in figure 1.3.
>
>          Picasa (http://picasa.google.com/) and other photo management
>> applications can decide when a region of
>
>     •
>
>          an image contains a human face.
>
>          Optical character recognition software classifies small  
> regions of
>> scanned text into individual characters by
>
>     •
>
>          classifying the small areas as individual characters.
>
>
>
> Niels
> http://nielsmayer.com
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] apache lucene mahout : for advanced xwiki "search" ?

Reply via email to