Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/hidden-markov-models.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/hidden-markov-models.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/hidden-markov-models.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,366 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <h1 id="hidden-markov-models">Hidden Markov Models</h1> +<p><a name="HiddenMarkovModels-IntroductionandUsage"></a></p> +<h2 id="introduction-and-usage">Introduction and Usage</h2> +<p>Hidden Markov Models are used in multiple areas of Machine Learning, such +as speech recognition, handwritten letter recognition or natural language +processing. </p> +<p><a name="HiddenMarkovModels-FormalDefinition"></a></p> +<h2 id="formal-definition">Formal Definition</h2> +<p>A Hidden Markov Model (HMM) is a statistical model of a process consisting +of two (in our case discrete) random variables O and Y, which change their +state sequentially. The variable Y with states {y_1, ... , y_n} is called +the "hidden variable", since its state is not directly observable. The +state of Y changes sequentially with a so called - in our case first-order +- Markov Property. This means, that the state change probability of Y only +depends on its current state and does not change in time. Formally we +write: P(Y(t+1)=y_i|Y(0)...Y(t)) = P(Y(t+1)=y_i|Y(t)) = P(Y(2)=y_i|Y(1)). +The variable O with states {o_1, ... , o_m} is called the "observable +variable", since its state can be directly observed. O does not have a +Markov Property, but its state probability depends statically on the +current state of Y.</p> +<p>Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states, m is the number of observable states, P is an n-dimensional vector containing initial hidden state probabilities, A is the nxn-dimensional "transition matrix" containing the transition probabilities such that A[i,j](i,j.html) +=P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional "emission matrix" +containing the observation probabilities such that B[i,j]= +P(O=o_i|Y=y_j).</p> +<p><a name="HiddenMarkovModels-Problems"></a></p> +<h2 id="problems">Problems</h2> +<p>Rabiner [1](1.html) + defined three main problems for HMM models:</p> +<ol> +<li>Evaluation: Given a sequence O of observations and a model M, what is +the probability P(O|M) that sequence O was generated by model M. The +Evaluation problem can be efficiently solved using the Forward algorithm</li> +<li>Decoding: Given a sequence O of observations and a model M, what is +the most likely sequence Y*=argmax(Y) P(O|M,Y) of hidden variables to +generate this sequence. The Decoding problem can be efficiently solved +using the Viterbi algorithm.</li> +<li>Learning: Given a sequence O of observations, what is the most likely +model M*=argmax(M)P(O|M) to generate this sequence. The Learning problem +can be efficiently solved using the Baum-Welch algorithm.</li> +</ol> +<p><a name="HiddenMarkovModels-Example"></a></p> +<h2 id="example">Example</h2> +<p>To build a Hidden Markov Model and use it to build some predictions, try a simple example like this:</p> +<p>Create an input file to train the model. Here we have a sequence drawn from the set of states 0, 1, 2, and 3, separated by space characters.</p> +<div class="codehilite"><pre>$ <span class="n">echo</span> "0 1 2 2 2 1 1 0 0 3 3 3 2 1 2 1 1 1 1 2 2 2 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 2 2 2 3 3 3 3 3 3 2 3 2 3 2 3 2 1 3 0 0 0 1 0 1 0 2 1 2 1 2 1 2 3 3 3 3 2 2 3 2 1 1 0" <span class="o">></span> <span class="n">hmm</span><span class="o">-</span><span class="n">input</span> +</pre></div> + + +<p>Now run the baumwelch job to train your model, after first setting MAHOUT_LOCAL to true, to use your local file system.</p> +<div class="codehilite"><pre>$ <span class="n">export</span> <span class="n">MAHOUT_LOCAL</span><span class="p">=</span><span class="n">true</span> +$ $<span class="n">MAHOUT_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">mahout</span> <span class="n">baumwelch</span> <span class="o">-</span><span class="nb">i</span> <span class="n">hmm</span><span class="o">-</span><span class="n">input</span> <span class="o">-</span><span class="n">o</span> <span class="n">hmm</span><span class="o">-</span><span class="n">model</span> <span class="o">-</span><span class="n">nh</span> 3 <span class="o">-</span><span class="n">no</span> 4 <span class="o">-</span><span class="n">e</span> <span class="p">.</span>0001 <span class="o">-</span><span class="n">m</span> 1000 +</pre></div> + + +<p>Output like the following should appear in the console.</p> +<div class="codehilite"><pre><span class="n">Initial</span> <span class="n">probabilities</span><span class="p">:</span> +0 1 2 +1<span class="p">.</span>0 0<span class="p">.</span>0 3<span class="p">.</span>5659361683006626<span class="n">E</span><span class="o">-</span>251 +<span class="n">Transition</span> <span class="n">matrix</span><span class="p">:</span> + 0 1 2 +0 6<span class="p">.</span>098919959130616<span class="n">E</span><span class="o">-</span>5 0<span class="p">.</span>9997275322964165 2<span class="p">.</span>1147850399214744<span class="n">E</span><span class="o">-</span>4 +1 7<span class="p">.</span>404648706054873<span class="n">E</span><span class="o">-</span>37 0<span class="p">.</span>9086408633885092 0<span class="p">.</span>09135913661149081 +2 0<span class="p">.</span>2284374545687356 7<span class="p">.</span>01786289571088<span class="n">E</span><span class="o">-</span>11 0<span class="p">.</span>7715625453610858 +<span class="n">Emission</span> <span class="n">matrix</span><span class="p">:</span> + 0 1 2 3 +0 0<span class="p">.</span>9999997858591223 2<span class="p">.</span>0536163836449762<span class="n">E</span><span class="o">-</span>39 2<span class="p">.</span>1414087769942127<span class="n">E</span><span class="o">-</span>7 1<span class="p">.</span>052441093535389<span class="n">E</span><span class="o">-</span>27 +1 7<span class="p">.</span>495656581383351<span class="n">E</span><span class="o">-</span>34 0<span class="p">.</span>2241269055449904 0<span class="p">.</span>4510889999455847 0<span class="p">.</span>32478409450942497 +2 0<span class="p">.</span>815051477991782 0<span class="p">.</span>18494852200821799 8<span class="p">.</span>465660634827592<span class="n">E</span><span class="o">-</span>33 2<span class="p">.</span>8603899591778015<span class="n">E</span><span class="o">-</span>36 +14<span class="o">/</span>03<span class="o">/</span>22 09<span class="p">:</span>52<span class="p">:</span>21 <span class="n">INFO</span> <span class="n">driver</span><span class="p">.</span><span class="n">MahoutDriver</span><span class="p">:</span> <span class="n">Program</span> <span class="n">took</span> 180 <span class="n">ms</span> <span class="p">(</span><span class="n">Minutes</span><span class="p">:</span> 0<span class="p">.</span>003<span class="p">)</span> +</pre></div> + + +<p>The model trained with the input set now is in the file 'hmm-model', which we can use to build a predicted sequence.</p> +<div class="codehilite"><pre>$ $<span class="n">MAHOUT_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">mahout</span> <span class="n">hmmpredict</span> <span class="o">-</span><span class="n">m</span> <span class="n">hmm</span><span class="o">-</span><span class="n">model</span> <span class="o">-</span><span class="n">o</span> <span class="n">hmm</span><span class="o">-</span><span class="n">predictions</span> <span class="o">-</span><span class="n">l</span> 10 +</pre></div> + + +<p>To see the predictions:</p> +<div class="codehilite"><pre>$ <span class="nb">cat</span> <span class="n">hmm</span><span class="o">-</span><span class="n">predictions</span> +0 1 3 3 2 2 2 2 1 2 +</pre></div> + + +<p><a name="HiddenMarkovModels-Resources"></a></p> +<h2 id="resources">Resources</h2> +<p>[1] + Lawrence R. Rabiner (February 1989). "A tutorial on Hidden Markov Models +and selected applications in speech recognition". Proceedings of the IEEE +77 (2): 257-286. doi:10.1109/5.18626.</p> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html>
Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/locally-weighted-linear-regression.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/locally-weighted-linear-regression.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/locally-weighted-linear-regression.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,292 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <p><a name="LocallyWeightedLinearRegression-LocallyWeightedLinearRegression"></a></p> +<h1 id="locally-weighted-linear-regression">Locally Weighted Linear Regression</h1> +<p>Model-based methods, such as SVM, Naive Bayes and the mixture of Gaussians, +use the data to build a parameterized model. After training, the model is +used for predictions and the data are generally discarded. In contrast, +"memory-based" methods are non-parametric approaches that explicitly retain +the training data, and use it each time a prediction needs to be made. +Locally weighted regression (LWR) is a memory-based method that performs a +regression around a point of interest using only training data that are +"local" to that point. Source: +http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/cohn96a-html/node7.html</p> +<p><a name="LocallyWeightedLinearRegression-Strategyforparallelregression"></a></p> +<h2 id="strategy-for-parallel-regression">Strategy for parallel regression</h2> +<p><a name="LocallyWeightedLinearRegression-Designofpackages"></a></p> +<h2 id="design-of-packages">Design of packages</h2> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html> Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/logistic-regression.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/logistic-regression.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/logistic-regression.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,364 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <p><a name="LogisticRegression-LogisticRegression(SGD)"></a></p> +<h1 id="logistic-regression-sgd">Logistic Regression (SGD)</h1> +<p>Logistic regression is a model used for prediction of the probability of +occurrence of an event. It makes use of several predictor variables that +may be either numerical or categories.</p> +<p>Logistic regression is the standard industry workhorse that underlies many +production fraud detection and advertising quality and targeting products. +The Mahout implementation uses Stochastic Gradient Descent (SGD) to all +large training sets to be used.</p> +<p>For a more detailed analysis of the approach, have a look at the <a href="http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en">thesis of +Paul Komarek</a>.</p> +<p>See MAHOUT-228 for the main JIRA issue for SGD.</p> +<p><a name="LogisticRegression-Parallelizationstrategy"></a></p> +<h2 id="parallelization-strategy">Parallelization strategy</h2> +<p>The bad news is that SGD is an inherently sequential algorithm. The good +news is that it is blazingly fast and thus it is not a problem for Mahout's +implementation to handle training sets of tens of millions of examples. +With the down-sampling typical in many data-sets, this is equivalent to a +dataset with billions of raw training examples.</p> +<p>The SGD system in Mahout is an online learning algorithm which means that +you can learn models in an incremental fashion and that you can do +performance testing as your system runs. Often this means that you can +stop training when a model reaches a target level of performance. The SGD +framework includes classes to do on-line evaluation using cross validation +(the CrossFoldLearner) and an evolutionary system to do learning +hyper-parameter optimization on the fly (the AdaptiveLogisticRegression). +The AdaptiveLogisticRegression system makes heavy use of threads to +increase machine utilization. The way it works is that it runs 20 +CrossFoldLearners in separate threads, each with slightly different +learning parameters. As better settings are found, these new settings are +propagating to the other learners.</p> +<p><a name="LogisticRegression-Designofpackages"></a></p> +<h2 id="design-of-packages">Design of packages</h2> +<p>There are three packages that are used in Mahout's SGD system. These +include</p> +<ul> +<li> +<p>The vector encoding package (found in org.apache.mahout.vectorizer.encoders)</p> +</li> +<li> +<p>The SGD learning package (found in org.apache.mahout.classifier.sgd)</p> +</li> +<li> +<p>The evolutionary optimization system (found in org.apache.mahout.ep)</p> +</li> +</ul> +<p><a name="LogisticRegression-Featurevectorencoding"></a></p> +<h3 id="feature-vector-encoding">Feature vector encoding</h3> +<p>Because the SGD algorithms need to have fixed length feature vectors and +because it is a pain to build a dictionary ahead of time, most SGD +applications use the hashed feature vector encoding system that is rooted +at FeatureVectorEncoder.</p> +<p>The basic idea is that you create a vector, typically a +RandomAccessSparseVector, and then you use various feature encoders to +progressively add features to that vector. The size of the vector should +be large enough to avoid feature collisions as features are hashed.</p> +<p>There are specialized encoders for a variety of data types. You can +normally encode either a string representation of the value you want to +encode or you can encode a byte level representation to avoid string +conversion. In the case of ContinuousValueEncoder and +ConstantValueEncoder, it is also possible to encode a null value and pass +the real value in as a weight. This avoids numerical parsing entirely in +case you are getting your training data from a system like Avro.</p> +<p>Here is a class diagram for the encoders package:</p> +<p><img alt="class diagram" src="../../images/vector-class-hierarchy.png" /></p> +<p><a name="LogisticRegression-SGDLearning"></a></p> +<h3 id="sgd-learning">SGD Learning</h3> +<p>For the simplest applications, you can construct an +OnlineLogisticRegression and be off and running. Typically, though, it is +nice to have running estimates of performance on held out data. To do +that, you should use a CrossFoldLearner which keeps a stable of five (by +default) OnlineLogisticRegression objects. Each time you pass a training +example to a CrossFoldLearner, it passes this example to all but one of its +children as training and passes the example to the last child to evaluate +current performance. The children are used for evaluation in a round-robin +fashion so, if you are using the default 5 way split, all of the children +get 80% of the training data for training and get 20% of the data for +evaluation.</p> +<p>To avoid the pesky need to configure learning rates, regularization +parameters and annealing schedules, you can use the +AdaptiveLogisticRegression. This class maintains a pool of +CrossFoldLearners and adapts learning rates and regularization on the fly +so that you don't have to.</p> +<p>Here is a class diagram for the classifiers.sgd package. As you can see, +the number of twiddlable knobs is pretty large. For some examples, see the +TrainNewsGroups example code.</p> +<p><img alt="sgd class diagram" src="../../images/sgd-class-hierarchy.png" /></p> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html> Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/naivebayes.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/naivebayes.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/naivebayes.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,307 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <p><a name="NaiveBayes-NaiveBayes"></a></p> +<h1 id="naive-bayes">Naive Bayes</h1> +<p>Naive Bayes is an algorithm that can be used to classify objects into +usually binary categories. It is one of the most common learning algorithms +in spam filters. Despite its simplicity and rather naive assumptions it has +proven to work surprisingly well in practice.</p> +<p>Before applying the algorithm, the objects to be classified need to be +represented by numerical features. In the case of e-mail spam each feature +might indicate whether some specific word is present or absent in the mail +to classify. The algorithm comes in two phases: Learning and application. +During learning, a set of feature vectors is given to the algorithm, each +vector labeled with the class the object it represents, belongs to. From +that it is deduced which combination of features appears with high +probability in spam messages. Given this information, during application +one can easily compute the probability of a new message being either spam +or not.</p> +<p>The algorithm does make several assumptions, that are not true for most +datasets, but make computations easier. The worst probably being, that all +features of an objects are considered independent. In practice, that means, +given the phrase "Statue of Liberty" was already found in a text, does not +influence the probability of seeing the phrase "New York" as well.</p> +<p><a name="NaiveBayes-StrategyforaparallelNaiveBayes"></a></p> +<h2 id="strategy-for-a-parallel-naive-bayes">Strategy for a parallel Naive Bayes</h2> +<p>See <a href="https://issues.apache.org/jira/browse/MAHOUT-9">https://issues.apache.org/jira/browse/MAHOUT-9</a> +.</p> +<p><a name="NaiveBayes-Examples"></a></p> +<h2 id="examples">Examples</h2> +<p><a href="20newsgroups.html">20Newsgroups</a> + - Example code showing how to train and use the Naive Bayes classifier +using the 20 Newsgroups data available at [http://people.csail.mit.edu/jrennie/20Newsgroups/]</p> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html>
