[
https://issues.apache.org/jira/browse/MAHOUT-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969112#comment-13969112
]
Nick Martin commented on MAHOUT-1445:
-------------------------------------
Item Based Recommender
Introduction
Mahout’s item based recommender is a flexible and easily implemented algorithm
with a diverse range of applications. The minimalism of the primary input
file’s structure and availability of ancillary filtering controls can make
sourcing required data and shaping a desired output both efficient and
straightforward.
Typical use cases include:
• Recommend products to customers via an eCommerce platform (think: Amazon,
Netflix, Overstock)
• Identify organic sales opportunities
• Segment users/customers based on similar item preferences
Broadly speaking, Mahout's item-based recommendation algorithm takes as input
customer preferences by item and generates an output recommending similar items
with a score indicating the likelihood a customer will "like" the recommended
item.
One of the strengths of the item based recommender is its adaptability to your
business conditions or research interests. For example, there are many
available approaches for providing product preference. One such method is to
calculate the total orders for a given product for each customer (i.e. Acme
Corp has ordered Widget-A 5,678 times) while others rely on user preference
captured via the web (i.e. Jane Doe rated a movie as five stars, or gave a
product two thumbs’ up).
Additionally, a variety of methodologies can be implemented to narrow the focus
of Mahout's recommendations, such as:
• Exclude low volume or low profitability products from consideration
• Group customers by segment or market rather than using user/customer level
data
• Exclude zero-dollar transactions, returns or other order types
• Map product substitutions into the Mahout input (i.e. if WidgetA is a
recommended item replace it with WidgetX)
The item based recommender output can be easily consumed by downstream
applications (i.e. websites, ERP systems or salesforce automation tools) and is
configurable so users can determine the number of item recommendations
generated by the algorithm.
Example
Testing the item based recommender can be a simple and potentially quite
rewarding endeavor. Whereas the typical sample use case for collaborative
filtering focuses on utilization of, and integration with, eCommerce platforms
we can instead look at a potential use case applicable to most businesses (even
those without a web presence). Let’s look at how a company might use Mahout’s
item based recommender to identify new sales opportunities for an existing
customer base. First, you’ll need to get Mahout up and running, the
instructions for which can be found here
(https://mahout.apache.org/users/basics/quickstart.html). After you've ensured
Mahout is properly installed we’re ready to run a quick example.
Step 1: Gather some test data
Mahout’s item based recommender relies on three key pieces of data: userID,
itemID and preference. The “users” could be website visitors or simply
customers that purchase products from your business. Similarly, items could be
products, product groups or even pages on your website – really anything you
would want to recommend to a group of users or customers. For our example let’s
use customer orders as a proxy for preference. A simple count of distinct
orders by customer, by product will work for this example. You’ll find as you
explore ways to manipulate the item based recommender the preference value can
be many things (page clicks, explicit ratings, order counts, etc.). Once your
test data is gathered put it in a .txt file separated by commas with no column
headers included.
Step 2: Pick a similarity measure
Choosing a similarity measure for use in a production environment is something
that requires careful testing, evaluation and research. For our example
purposes, we’ll just go with a Mahout similarity classname called
“SIMILARITY_LOGLIKELIHOOD”.
Step 3: Configure the Mahout command
Assuming your JAVA_HOME is appropriately set and Mahout was installed properly
we’re ready to configure our syntax. Enter the following command:
$ mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file
-o /path/to/desired/output --numRecommendations 25
Running the command will execute a series of jobs the final product of which
will be an output file deposited to the directory specified in the command
syntax. The output file will contain two columns: the userID and an array of
itemIDs and scores.
Step 4: Making use of the output and doing more with Mahout
The output file generated in our simple example can be transformed using your
tool of choice and consumed by downstream applications. There exist a variety
of configuration options for Mahout’s item based recommender to accommodate
custom business requirements; exploring and testing various configurations to
suit your needs will doubtless lead to additional questions. Our user community
is accessible via our mailing list
(https://mahout.apache.org/general/mailing-lists,-irc-and-archives.html) and
Mahout In Action is a fantastic starting point.
[~ssc] Let me know if you think this needs more tweaks. Aim was concision and
enough context to get first time users thinking about what they could be doing
with recommenders and get them running quickly.
Also, from https://mahout.apache.org/users/basics/quickstart.html the "Do's and
Don'ts" link on this page goes to nowhere. I'll spin a JIRA for the fix.
> Create an intro for item based recommender
> ------------------------------------------
>
> Key: MAHOUT-1445
> URL: https://issues.apache.org/jira/browse/MAHOUT-1445
> Project: Mahout
> Issue Type: New Feature
> Components: Documentation
> Affects Versions: 1.0
> Reporter: Maciej Mazur
> Labels: documentation, recommender
> Fix For: 1.0
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)