Niels,
do you have successful usages already?
Mahout is promising for sure but it seems quite at its debut.
One thing that seemed almost implementable is "preference matching":
Mahout taste:
http://lucene.apache.org/mahout/taste.html
But I did not find the time to use it.
paul
Le 07-janv.-10 à 23:10, Niels Mayer a écrit :
> http://lucene.apache.org/mahout/ <http://lucene.apache.org/mahout/
> >Mahout's
> goal is to build scalable machine learning libraries. With scalable
> we mean:
>
> -
>
> Scalable to reasonably large data sets. Our core algorithms for
> clustering, classfication and batch based collaborative filtering
> are
> implemented on top of Apache Hadoop using the map/reduce paradigm.
> However
> we do not restrict contributions to Hadoop based implementations:
> Contributions that run on a single node or on a non-Hadoop cluster
> are
> welcome as well. The core libraries are highly optimized to allow
> for good
> performance also for non-distributed algorithms.
>
>
> http://www.manning.com/owen/
>
> Mahout is a machine learning library. The algorithms it
> implements fall
>> under the broad umbrella of “machine
>
> learning,” or “collective intelligence.” This can mean many things,
> but at
>> the moment for Mahout it means primarily
>
> recommender engines, clustering, and classification.
>
> It is scalable. It attempts to provide implementations that use
> modern
>> frameworks for splitting huge
>
> computations efficiently across many machines. Mahout aims to be the
> machine
>> learning tool of choice when the
>
> data to be processed is far too big for a single machine. In its
> current
>> incarnation, these scalable implementations
>
> are written in Java and built upon Apache's Hadoop project.
>
> It is a Java library. It does not provide a user interface, a
>> pre-packaged server, or installer. It is a framework of
>
> tools intended to be used and adapted by developers. Mahout can be
> deployed
>> to solve problems if you are
>
> developing modern, intelligent applications or if you are a leading a
>> product team or startup that will leverage
>
> machine learning to create a competitive advantage.
>
> If you are a researcher in artificial intelligence, machine
> learning and
>> related areas your biggest obstacle is
>
> probably translating new algorithms into practice. Mahout provides a
> fertile
>> framework for testing and deploying
>
> new large-scale algorithms.
>
> ...
> some example usage:
> ...
>
>> Recommender Engines
>
> Recommender engines are perhaps the most immediately recognizable
> machine
>> learning technique in use today.
>
> We've all seen services or sites that attempt to recommend books or
> movies
>> or articles based on our past actions.
>
> They try to infer tastes and preferences and identify unknown items
> that are
>> of interest:
>
> Amazon.com is perhaps the most famous commerce site to deploy
>> recommendations. Based on purchases
>
> •
>
> and site activity, Amazon recommends books and other items
> likely
>> to be of interest. See figure 1.1.
>
> Netflix similarly recommends DVDs that may be of interest, and
>> famously offered a $1,000,000 prize to
>
> •
>
> researchers that could improve the quality of their
>> recommendations.
>
> Social networking sites like Facebook use variants on
> recommender
>> techniques to identify people most
>
> •
>
> likely to be an as-yet-unconnected friend.
>
>
>
> ....
>
>> Clustering
>
> Clustering turns up in less obvious but equally well-known contexts.
> As its
>> name implies, clustering techniques
>
> attempt to group a large number of things together into clusters
> that share
>> some similarity. It is a way to discover
>
> hierarchy and order in a large or hard-to-understand data set, and
> in that
>> way reveal interesting patterns or make
>
> the data set easier to comprehend.
>
> Google News groups news articles according to their topic
> using
>> clustering techniques in order to present
>
> •
>
> news grouped by logical story, rather than a raw listing of
> all
>> articles. Figure 1.2 below illustrates this.
>
> Search engines like Clusty group search results for similar
>> reasons.
>
> •
>
> ...
>
>> Classification
>
> Classification techniques decide how much a thing is or isn't part
> of some
>> type or category, or, does or doesn't
>
> have some attribute. Classification is likewise ubiquitous though
> even more
>> behind-the-scenes. Often these
>
> systems “learn” by reviewing many instances of items of the
> categories in
>> question in order to deduce classification
>
> rules. This general idea finds many applications:
>
> Yahoo! Mail decides whether incoming messages are spam, or
> not,
>> based on prior emails and spam
>
> •
>
> reports from users, as well as characteristics of the e-mail
>> itself. A few messages classified as spam are
>
> shown in figure 1.3.
>
> Picasa (http://picasa.google.com/) and other photo management
>> applications can decide when a region of
>
> •
>
> an image contains a human face.
>
> Optical character recognition software classifies small
> regions of
>> scanned text into individual characters by
>
> •
>
> classifying the small areas as individual characters.
>
>
>
> Niels
> http://nielsmayer.com
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs