Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/neural-network.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/neural-network.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/neural-network.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,288 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <p><a name="NeuralNetwork-NeuralNetworks"></a></p> +<h1 id="neural-networks">Neural Networks</h1> +<p>Neural Networks are a means for classifying multi dimensional objects. We +concentrate on implementing back propagation networks with one hidden layer +as these networks have been covered by the <a href="http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf">2006 NIPS map reduce paper</a> +. Those networks are capable of learning not only linear separating hyper +planes but arbitrary decision boundaries.</p> +<p><a name="NeuralNetwork-Strategyforparallelbackpropagationnetwork"></a></p> +<h2 id="strategy-for-parallel-backpropagation-network">Strategy for parallel backpropagation network</h2> +<p><a name="NeuralNetwork-Designofimplementation"></a></p> +<h2 id="design-of-implementation">Design of implementation</h2> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html>
Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/partial-implementation.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/partial-implementation.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/partial-implementation.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,411 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <h1 id="classifying-with-random-forests">Classifying with random forests</h1> +<p><a name="PartialImplementation-Introduction"></a></p> +<h1 id="introduction">Introduction</h1> +<p>This quick start page shows how to build a decision forest using the +partial implementation. This tutorial also explains how to use the decision +forest to classify new data. +Partial Decision Forests is a mapreduce implementation where each mapper +builds a subset of the forest using only the data available in its +partition. This allows building forests using large datasets as long as +each partition can be loaded in-memory.</p> +<p><a name="PartialImplementation-Steps"></a></p> +<h1 id="steps">Steps</h1> +<p><a name="PartialImplementation-Downloadthedata"></a></p> +<h2 id="download-the-data">Download the data</h2> +<ul> +<li>The current implementation is compatible with the UCI repository file +format. In this example we'll use the NSL-KDD dataset because its large +enough to show the performances of the partial implementation. +You can download the dataset here http://nsl.cs.unb.ca/NSL-KDD/ +You can either download the full training set "KDDTrain+.ARFF", or a 20% +subset "KDDTrain+_20Percent.ARFF" (we'll use the full dataset in this +tutorial) and the test set "KDDTest+.ARFF".</li> +<li>Open the train and test files and remove all the lines that begin with +'@'. All those lines are at the top of the files. Actually you can keep +those lines somewhere, because they'll help us describe the dataset to +Mahout</li> +<li>Put the data in HDFS: {code} +$HADOOP_HOME/bin/hadoop fs -mkdir testdata +$HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata{code}</li> +</ul> +<p><a name="PartialImplementation-BuildtheJobfiles"></a></p> +<h2 id="build-the-job-files">Build the Job files</h2> +<ul> +<li>In $MAHOUT_HOME/ run: {code}mvn clean install -DskipTests{code}</li> +</ul> +<p><a name="PartialImplementation-Generateafiledescriptorforthedataset:"></a></p> +<h2 id="generate-a-file-descriptor-for-the-dataset">Generate a file descriptor for the dataset:</h2> +<p>run the following command:</p> +<div class="codehilite"><pre>$<span class="n">HADOOP_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">hadoop</span> <span class="n">jar</span> +</pre></div> + + +<p>$MAHOUT_HOME/core/target/mahout-core-<VERSION>-job.jar +org.apache.mahout.classifier.df.tools.Describe -p testdata/KDDTrain+.arff +-f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L</p> +<p>The "N 3 C 2 N C 4 N C 8 N 2 C 19 N L" string describes all the attributes +of the data. In this cases, it means 1 numerical(N) attribute, followed by +3 Categorical(C) attributes, ...L indicates the label. You can also use 'I' +to ignore some attributes</p> +<p><a name="PartialImplementation-Runtheexample"></a></p> +<h2 id="run-the-example">Run the example</h2> +<div class="codehilite"><pre>$<span class="n">HADOOP_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">hadoop</span> <span class="n">jar</span> +</pre></div> + + +<p>$MAHOUT_HOME/examples/target/mahout-examples-<version>-job.jar +org.apache.mahout.classifier.df.mapreduce.BuildForest +-Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds +testdata/KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest</p> +<p>which builds 100 trees (-t argument) using the partial implementation (-p). +Each tree is built using 5 random selected attribute per node (-sl +argument) and the example outputs the decision tree in the "nsl-forest" +directory (-o). +The number of partitions is controlled by the -Dmapred.max.split.size +argument that indicates to Hadoop the max. size of each partition, in this +case 1/10 of the size of the dataset. Thus 10 partitions will be used. +IMPORTANT: using less partitions should give better classification results, +but needs a lot of memory. So if the Jobs are failing, try increasing the +number of partitions. +* The example outputs the Build Time and the oob error estimation</p> +<div class="codehilite"><pre>10<span class="o">/</span>03<span class="o">/</span>13 17<span class="p">:</span>57<span class="p">:</span>29 <span class="n">INFO</span> <span class="n">mapreduce</span><span class="p">.</span><span class="n">BuildForest</span><span class="p">:</span> <span class="n">Build</span> <span class="n">Time</span><span class="p">:</span> 0<span class="n">h</span> 7<span class="n">m</span> 43<span class="n">s</span> 582 +10<span class="o">/</span>03<span class="o">/</span>13 17<span class="p">:</span>57<span class="p">:</span>33 <span class="n">INFO</span> <span class="n">mapreduce</span><span class="p">.</span><span class="n">BuildForest</span><span class="p">:</span> <span class="n">oob</span> <span class="n">error</span> <span class="n">estimate</span> <span class="p">:</span> +</pre></div> + + +<p>0.002325895231517865 + 10/03/13 17:57:33 INFO mapreduce.BuildForest: Storing the forest in: +nsl-forest/forest.seq</p> +<p><a name="PartialImplementation-UsingtheDecisionForesttoClassifynewdata"></a></p> +<h2 id="using-the-decision-forest-to-classify-new-data">Using the Decision Forest to Classify new data</h2> +<p>run the following command:</p> +<div class="codehilite"><pre>$<span class="n">HADOOP_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">hadoop</span> <span class="n">jar</span> +</pre></div> + + +<p>$MAHOUT_HOME/examples/target/mahout-examples-<version>-job.jar +org.apache.mahout.classifier.df.mapreduce.TestForest -i +nsl-kdd/KDDTest+.arff -ds nsl-kdd/KDDTrain+.info -m nsl-forest -a -mr -o +predictions</p> +<p>This will compute the predictions of "KDDTest+.arff" dataset (-i argument) +using the same data descriptor generated for the training dataset (-ds) and +the decision forest built previously (-m). Optionally (if the test dataset +contains the labels of the tuples) run the analyzer to compute the +confusion matrix (-a), and you can also store the predictions in a text +file or a directory of text files(-o). Passing the (-mr) parameter will use +Hadoop to distribute the classification.</p> +<ul> +<li> +<p>The example should output the classification time and the confusion +matrix</p> +<p>10/03/13 18:08:56 INFO mapreduce.TestForest: Classification Time: 0h 0m 6s +355 +10/03/13 18:08:56 INFO mapreduce.TestForest: +======================================================= +Summary</p> +<hr /> +<p>Correctly Classified Instances : 17657 78.3224% +Incorrectly Classified Instances : 4887 21.6776% +Total Classified Instances : 22544</p> +<p>======================================================= +Confusion Matrix</p> +<hr /> +<p>a b <--Classified as +9459 252 | 9711 a = normal +4635 8198 | 12833 b = anomaly +Default Category: unknown: 2</p> +</li> +</ul> +<p>If the input is a single file then the output will be a single text file, +in the above example 'predictions' would be one single file. If the input +if a directory containing for example two files 'a.data' and 'b.data', then +the output will be a directory 'predictions' containing two files +'a.data.out' and 'b.data.out'</p> +<p><a name="PartialImplementation-KnownIssuesandlimitations"></a></p> +<h2 id="known-issues-and-limitations">Known Issues and limitations</h2> +<p>The "Decision Forest" code is still "a work in progress", many features are +still missing. Here is a list of some known issues: +<em> For now, the training does not support multiple input files. The input +dataset must be one single file (this support will be available with the upcoming release). +Classifying new data does support multiple +input files. +</em> The tree building is done when each mapper.close() method is called. +Because the mappers don't refresh their state, the job can fail when the +dataset is big and you try to build a large number of trees.</p> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html> Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/random-forests.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/random-forests.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/random-forests.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,278 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html> Added: websites/staging/mahout/trunk/content/users/mapreduce/classification/restricted-boltzmann-machines.html ============================================================================== --- websites/staging/mahout/trunk/content/users/mapreduce/classification/restricted-boltzmann-machines.html (added) +++ websites/staging/mahout/trunk/content/users/mapreduce/classification/restricted-boltzmann-machines.html Thu Mar 19 21:21:45 2015 @@ -0,0 +1,313 @@ +<!DOCTYPE html> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> + <title>Apache Mahout: Scalable machine learning and data mining</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="Distribution" content="Global"> + <meta name="Robots" content="index,follow"> + <meta name="keywords" content="apache, apache hadoop, apache lucene, + business data mining, cluster analysis, + collaborative filtering, data extraction, data filtering, data framework, data integration, + data matching, data mining, data mining algorithms, data mining analysis, data mining data, + data mining introduction, data mining software, + data mining techniques, data representation, data set, datamining, + feature extraction, fuzzy k means, genetic algorithm, hadoop, + hierarchical clustering, high dimensional, introduction to data mining, kmeans, + knowledge discovery, learning approach, learning approaches, learning methods, + learning techniques, lucene, machine learning, machine translation, mahout apache, + mahout taste, map reduce hadoop, mining data, mining methods, naive bayes, + natural language processing, + supervised, text mining, time series data, unsupervised, web data mining"> + <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico"> + <script type="text/javascript" src="/js/prototype.js"></script> + <script type="text/javascript" src="/js/effects.js"></script> + <script type="text/javascript" src="/js/search.js"></script> + <script type="text/javascript" src="/js/slides.js"></script> + + <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> + <link href="/css/bootstrap-responsive.css" rel="stylesheet"> + <link rel="stylesheet" href="/css/global.css" type="text/css"> + + <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown --> + <script type="text/x-mathjax-config"> + MathJax.Hub.Config({ + tex2jax: { + skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] + } + }); + MathJax.Hub.Queue(function() { + var all = MathJax.Hub.getAllJax(), i; + for(i = 0; i < all.length; i += 1) { + all[i].SourceElement().parentNode.className += ' has-jax'; + } + }); + </script> + <script type="text/javascript"> + var mathjax = document.createElement('script'); + mathjax.type = 'text/javascript'; + mathjax.async = true; + + mathjax.src = ('https:' == document.location.protocol) ? + 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : + 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(mathjax, s); + </script> +</head> + +<body id="home" data-twttr-rendered="true"> + <div id="wrap"> + <div id="header"> + <div id="logo"><a href="/overview.html"></a></div> + <div id="search"> + <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right"> + <input value="http://mahout.apache.org" name="sitesearch" type="hidden"> + <input class="search-query" name="q" id="query" type="text"> + <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" /> + </form> + </div> + + <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;"> + <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;"> + <div class="container"> + <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <!-- <a class="brand" href="#">Apache Community Development Project</a> --> + <div class="nav-collapse collapse"> + <ul class="nav"> + <li><a href="/">Home</a></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/general/downloads.html">Downloads</a> + <li><a href="/general/who-we-are.html">Who we are</a> + <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a> + <li><a href="/general/release-notes.html">Release Notes</a> + <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li> + <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a> + <li><a href="/general/professional-support.html">Professional Support</a> + <li class="divider"></li> + <li class="nav-header">Resources</li> + <li><a href="/general/reference-reading.html">Reference Reading</a> + <li><a href="/general/faq.html">FAQ</a> + <li class="divider"></li> + <li class="nav-header">Legal</li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + <li><a href="/general/privacy-policy.html">Privacy Policy</a> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/developers/developer-resources.html">Developer resources</a></li> + <li><a href="/developers/version-control.html">Version control</a></li> + <li><a href="/developers/buildingmahout.html">Build from source</a></li> + <li><a href="/developers/issue-tracker.html">Issue tracker</a></li> + <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li> + <li class="divider"></li> + <li class="nav-header">Contributions</li> + <li><a href="/developers/how-to-contribute.html">How to contribute</a></li> + <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li> + <li><a href="/developers/gsoc.html">GSoC</a></li> + <li class="divider"></li> + <li class="nav-header">For committers</li> + <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li> + <li><a href="/developers/patch-check-list.html">Patch check list</a></li> + <li><a href="/developers/github.html">Handling Github PRs</a></li> + <li><a href="/developers/how-to-release.html">How to release</a></li> + <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/basics/algorithms.html">List of algorithms</a> + <li><a href="/users/basics/quickstart.html">Quickstart</a> + <li class="divider"></li> + <li class="nav-header">Working with text</li> + <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a> + <li><a href="/users/basics/collocations.html">Collocations</a> + <li class="divider"></li> + <li class="nav-header">Dimensionality reduction</li> + <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li> + <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li> + <li class="divider"></li> + <li class="nav-header">Topic Models</li> + <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/sparkbindings/home.html">Scala & Spark Bindings Overview</a></li> + <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li> + <li class="divider"></li> + <li><a href="/users/sparkbindings/faq.html">FAQ</a></li> + </ul> + </li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li> + <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> + <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li> + <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li> + + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li> + <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li> + <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li> + <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> + <li class="divider"></li> + <li class="nav-header">Commandline usage</li> + <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li> + <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li> + <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li> + <li class="divider"></li> + <li class="nav-header">Examples</li> + <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li> + <li class="divider"></li> + <li class="nav-header">Post processing</li> + <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li> + <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li> + </ul></li> + <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a> + <ul class="dropdown-menu"> + <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li> + <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li> + <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li> + <li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li> + <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li> + <li class="divider"></li> + <li class="nav-header">Hadoop</li> + <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li> + <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li> + <li class="nav-header">Spark</li> + <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li> + </ul> + </li> + </ul> + </div><!--/.nav-collapse --> + </div> + </div> + </div> + +</div> + + <div id="sidebar"> + <div id="sidebar-wrap"> + <h2>Twitter</h2> + <ul class="sidemenu"> + <li> +<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a> +<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> +</li> + </ul> + <h2>Apache Software Foundation</h2> + <ul class="sidemenu"> + <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li> + <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> + <li><a href="http://www.apache.org/dev/">Developer Resources</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + </ul> + <h2>Related Projects</h2> + <ul class="sidemenu"> + <li><a href="http://lucene.apache.org/">Lucene</a></li> + <li><a href="http://hadoop.apache.org/">Hadoop</a></li> + </ul> + </div> +</div> + + <div id="content-wrap" class="clearfix"> + <div id="main"> + <ol> +<li></li> +</ol> +<p>The JIRA issue is <a href="https://issues.apache.org/jira/browse/MAHOUT-375">here</a> +. </p> +<p><a name="RestrictedBoltzmannMachines-BoltzmannMachines"></a></p> +<h3 id="boltzmann-machines">Boltzmann Machines</h3> +<p>Boltzmann Machines are a type of stochastic neural networks that closely +resemble physical processes. They define a network of units with an overall +energy that is evolved over a period of time, until it reaches thermal +equilibrium. </p> +<p>However, the convergence speed of Boltzmann machines that have +unconstrained connectivity is low.</p> +<p><a name="RestrictedBoltzmannMachines-RestrictedBoltzmannMachines"></a></p> +<h3 id="restricted-boltzmann-machines">Restricted Boltzmann Machines</h3> +<p>Restricted Boltzmann Machines are a variant, that are 'restricted' in the +sense that connections between hidden units of a single layer are <em>not</em> +allowed. In addition, stacking multiple RBM's is also feasible, with the +activities of the hidden units forming the base for a higher-level RBM. The +combination of these two features renders RBM's highly usable for +parallelization. </p> +<p>In the Netflix Prize, RBM's offered distinctly orthogonal predictions to +SVD and k-NN approaches, and contributed immensely to the final solution.</p> +<p><a name="RestrictedBoltzmannMachines-RBM'sinApacheMahout"></a></p> +<h3 id="rbms-in-apache-mahout">RBM's in Apache Mahout</h3> +<p>An implementation of Restricted Boltzmann Machines is being developed for +Apache Mahout as a Google Summer of Code 2010 project. A recommender +interface will also be provided. The key aims of the implementation are: +1. Accurate - should replicate known results, including those of the Netflix +Prize +1. Fast - The implementation uses Map-Reduce, hence, it should be fast +1. Scale - Should scale to large datasets, with a design whose critical +parts don't need a dependency between the amount of memory on your cluster +systems and the size of your dataset</p> +<p>You can view the patch as it develops <a href="http://github.com/sisirkoppaka/mahout-rbm/compare/trunk...rbm">here</a> +.</p> + </div> + </div> +</div> + <footer class="footer" align="center"> + <div class="container"> + <p> + Copyright © 2014 The Apache Software Foundation, Licensed under + the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br /> + Apache and the Apache feather logos are trademarks of The Apache Software Foundation. + </p> + </div> + </footer> + + <script src="/js/jquery-1.9.1.min.js"></script> + <script src="/js/bootstrap.min.js"></script> + <script> + (function() { + var cx = '012254517474945470291:vhsfv7eokdc'; + var gcse = document.createElement('script'); + gcse.type = 'text/javascript'; + gcse.async = true; + gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + + '//www.google.com/cse/cse.js?cx=' + cx; + var s = document.getElementsByTagName('script')[0]; + s.parentNode.insertBefore(gcse, s); + })(); + </script> +</body> +</html>
