http://git-wip-us.apache.org/repos/asf/mahout/blob/5112e9ec/docs/latest/tutorials/eigenfaces/index.html ---------------------------------------------------------------------- diff --git a/docs/latest/tutorials/eigenfaces/index.html b/docs/latest/tutorials/eigenfaces/index.html index cfaf6a9..ccd5624 100644 --- a/docs/latest/tutorials/eigenfaces/index.html +++ b/docs/latest/tutorials/eigenfaces/index.html @@ -1,283 +1,169 @@ - - <!DOCTYPE html> -<html lang="en"> +<html lang=" en "> + <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> - <title>Eigenfaces Demo</title> - - <meta name="author" content="Apache Mahout"> - - <!-- Enable responsive viewport --> - <meta name="viewport" content="width=device-width, initial-scale=1.0"> - - <!-- Bootstrap styles --> - <link href="/assets/themes/mahout3/css/bootstrap.min.css" rel="stylesheet"> - <!-- Optional theme --> - <link href="/assets/themes/mahout3/css/bootstrap-theme.min.css" rel="stylesheet"> - <!-- Sticky Footer --> - <link href="/assets/themes/mahout3/css/bs-sticky-footer.css" rel="stylesheet"> - - <!-- Custom styles --> - <link href="/assets/themes/mahout3/css/style.css" rel="stylesheet" type="text/css" media="all"> - - <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries --> - <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> - <!--[if lt IE 9]> - <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script> - <script src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js"></script> - <![endif]--> - - <!-- Fav and touch icons --> - <!-- Update these with your own images - <link rel="shortcut icon" href="images/favicon.ico"> - <link rel="apple-touch-icon" href="images/apple-touch-icon.png"> - <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png"> - <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png"> - --> - - <!-- atom & rss feed --> - <link href="/atom.xml" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed"> - <link href="/rss.xml" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed"> - <script type="text/x-mathjax-config"> - MathJax.Hub.Config({ - tex2jax: { - skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] - } - }); - MathJax.Hub.Queue(function() { - var all = MathJax.Hub.getAllJax(), i; - for(i = 0; i < all.length; i += 1) { - all[i].SourceElement().parentNode.className += ' has-jax'; - } - }); - </script> - <script type="text/javascript"> - var mathjax = document.createElement('script'); - mathjax.type = 'text/javascript'; - mathjax.async = true; - - mathjax.src = ('https:' == document.location.protocol) ? - 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : - 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; - - var s = document.getElementsByTagName('script')[0]; - s.parentNode.insertBefore(mathjax, s); - </script> -</head> - -<nav class="navbar navbar-default navbar-fixed-top"> - <div class="container-fluid"> - <!-- Brand and toggle get grouped for better mobile display --> - <div class="navbar-header"> - <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false"> - <span class="sr-only">Toggle navigation</span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - </button> - <a class="navbar-brand" href="/"> - <img src="/assets/img/Mahout-logo-82x100.png" height="30" alt="I'm mahout"> - </a> - </div> - + <title> + Eigenfaces Demo + </title> + <meta name="description" content="Distributed Linear Algebra"> -<!-- Collect the nav links, forms, and other content for toggling --> -<div class="collapse navbar-collapse" id="main-navbar"> - <ul class="nav navbar-nav"> - - <!-- Quick Start --> - <li id="quickstart"> - <a href="/index.html" >Mahout Overview</a> - </li> - - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Key Concepts<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/index.html">Mahout Overview</a></li> - <li><span><b> Scala DSL</b><span></li> - <li><a href="/mahout-samsara/in-core-reference.html">In-core Reference</a></li> - <li><a href="/mahout-samsara/out-of-core-reference.html">Out-of-core Reference</a></li> - <li><a href="/mahout-samsara/faq.html">Samsara FAQ</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Bindings</b><span></li> - <li><a href="/distributed/spark-bindings/">Spark Bindings</a></li> - <li><a href="/distributed/flink-bindings.html">Flink Bindings</a></li> - <li><a href="/distributed/flink-bindings.html">H20 Bindings</a></li> - <!--<li role="separator" class="divider"></li> - <li><span> <b>Native Solvers</b><span></li> - <li><a href="/native-solvers/viennacl.html">ViennaCL</a></li> - <li><a href="/native-solvers/viennacl-omp.html">ViennaCL-OMP</a></li> - <li><a href="/native-solvers/cuda.html">CUDA</a></li>--> - </ul> - </li> - - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Tutorials<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><span> <b>Reccomenders</b><span></li> - <li><a href="/tutorials/cco-lastfm">CCO Example with Last.FM Data</a></li> - <li><a href="/tutorials/intro-cooccurrence-spark">Introduction to Cooccurrence in Spark</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Mahout Samsara</b><span></li> - <li><a href="/tutorials/samsara/play-with-shell.html">Playing with Samsara in Spark Shell</a></li> - <li><a href="/tutorials/samsara/playing-with-samsara-flink-batch.html">Playing with Samsara in Flink Batch</a></li> - <li><a href="/tutorials/samsara/classify-a-doc-from-the-shell.html">Text Classification (Shell)</a></li> - <li><a href="/tutorials/samsara/spark-naive-bayes.html">Spark Naive Bayes</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Misc</b><span></li> - <li><a href="/tutorials/misc/mahout-in-zeppelin">Mahout in Apache Zeppelin</a></li> - <li><a href="/tutorials/misc/contributing-algos">How To Contribute a New Algorithm</a></li> - <li><a href="/tutorials/misc/how-to-build-an-app.html">How To Build An App</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Deprecated</b><span></li> - <li><a href="/tutorials/map-reduce">MapReduce</a></li> - </ul> - </li> - - - <!-- Algorithms (Samsara / MR) --> - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Algorithms<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/algorithms/linear-algebra">Distributed Linear Algebra</a></li> - <li><a href="/algorithms/preprocessors">Preprocessors</a></li> - <li><a href="/algorithms/regression">Regression</a></li> - <li><a href="/algorithms/reccomenders">Reccomenders</a></li> - <li role="separator" class="divider"></li> - <li><a href="/algorithms/map-reduce">MapReduce <i>(deprecated)</i></a></li> - </ul> - <!--<li><a href="/algorithms/reccomenders/recommender-overview.html">Reccomender Overview</a></li> Do we still need? seems like short version of next post--> - <!-- - <li><a href="/algorithms/reccomenders/intro-cooccurrence-spark.html">Intro to Coocurrence With Spark</a></li> - <li role="separator" class="divider"></li> - <li><span> <a href="/algorithms/map-reduce"><b>MapReduce</b> (deprecated)</a><span></li> + <link rel="stylesheet" href="/assets/css/main.css"> + <!-- Font Awesome --> + <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN" crossorigin="anonymous"> - --> - </li> + <!-- Google Fonts --> + <link href="https://fonts.googleapis.com/css?family=Maven+Pro:400,500" rel="stylesheet"> + <link href="https://fonts.googleapis.com/css?family=Muli:400,400i,700,700i" rel="stylesheet"> - <!-- Scala Docs --> - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">API Docs<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/0.13.0/api/index.html">0.13.0</a></li> - </ul> - </li> - - - </ul> - <form class="navbar-form navbar-left"> - <div class="form-group"> - <input type="text" class="form-control" placeholder="Search"> - </div> - <button type="submit" class="btn btn-default">Submit</button> - </form> - <ul class="nav navbar-nav navbar-right"> - <li><a href="http://github.com/apache/mahout">Github</a></li> - - <!-- Apache --> - <li class="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache <span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="http://www.apache.org/foundation/how-it-works.html">Apache Software Foundation</a></li> - <li><a href="http://www.apache.org/licenses/">Apache License</a></li> - <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> - <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> - </ul> - </li> + <link rel="canonical" href="http://mahout.apache.org//docs/latest/tutorials/eigenfaces/"> + <link rel="alternate" type="application/rss+xml" title="Apache Mahout" href="/%20/feed.xml"> - </ul> -</div><!-- /.navbar-collapse --> - </div><!-- /.container-fluid --> -</nav> +</head> + <body> -<div id="wrap"> - <body class=""> + <nav class="navbar navbar-expand-lg navbar-light bg-light navbar-mahout"> + + <div class="container"> + + <a class="navbar-brand" href="/"> + <img src="/assets/mahout-logo-blue.svg" alt=""> + </a> + + <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation"> + <span class="navbar-toggler-icon"></span> + </button> + + <div class="collapse navbar-collapse" id="navbarSupportedContent"> + + <div class="navbar-nav ml-auto"> + + <!-- Quick Start --> + <li class="nav-item"> + <a class="nav-link" href="//docs/latest/" >Mahout Overview</a> + </li> + + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Key Concepts</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="/docs/latest/index.html">Mahout Overview</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Scala DSL</h6> + <a class="dropdown-item" href="/docs/latest/mahout-samsara/in-core-reference.html">In-core Reference</a> + <a class="dropdown-item" href="/docs/latest/mahout-samsara/out-of-core-reference.html">Out-of-core Reference</a> + <a class="dropdown-item" href="/docs/latest/mahout-samsara/faq.html">Samsara FAQ</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Distributed Engine Bindings</h6> + <a class="dropdown-item" href="/docs/latest/distributed/spark-bindings/">Spark Bindings</a> + <a class="dropdown-item" href="/docs/latest/distributed/flink-bindings.html">Flink Bindings</a> + <a class="dropdown-item" href="/docs/latest/distributed/flink-bindings.html">H20 Bindings</a> + <!--<div class="dropdown-divider"></div> + <h6 class="dropdown-header">Native Solvers</h6> + <a class="dropdown-item" href="/docs/latest/native-solvers/viennacl.html">ViennaCL</a></li> + <a class="dropdown-item" href="/docs/latest/native-solvers/viennacl-omp.html">ViennaCL-OMP</a></li> + <a class="dropdown-item" href="/docs/latest/native-solvers/cuda.html">CUDA</a></li>--> + </div> + </li> + + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Tutorial</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Reccomenders</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/cco-lastfm">CCO Example with Last.FM Data</a> + <a class="dropdown-item" href="/docs/latest/tutorials/intro-cooccurrence-spark">Introduction to Cooccurrence in Spark</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Mahout Samsara</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/play-with-shell.html">Playing with Samsara in Spark Shell</a> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/playing-with-samsara-flink-batch.html">Playing with Samsara in Flink Batch</a> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/classify-a-doc-from-the-shell.html">Text Classification (Shell)</a> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/spark-naive-bayes.html">Spark Naive Bayes</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Misc</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/misc/mahout-in-zeppelin">Mahout in Apache Zeppelin</a> + <a class="dropdown-item" href="/docs/latest/tutorials/misc/contributing-algos">How To Contribute a New Algorithm</a> + <a class="dropdown-item" href="/docs/latest/tutorials/misc/how-to-build-an-app.html">How To Build An App</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Deprecated</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/map-reduce">MapReduce</a> + </div> + </li> + + + <!-- Algorithms (Samsara / MR) --> + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Algorithms</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="/docs/latest/algorithms/linear-algebra">Distributed Linear Algebra</a> + <a class="dropdown-item" href="/docs/latest/algorithms/preprocessors">Preprocessors</a> + <a class="dropdown-item" href="/docs/latest/algorithms/regression">Regression</a> + <a class="dropdown-item" href="/docs/latest/algorithms/reccomenders">Reccomenders</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Deprecated</h6> + <a class="dropdown-item" href="/docs/latest/algorithms/map-reduce">MapReduce <i>(deprecated)</i></a> + </div> + <!--<a class="dropdown-item" href="/docs/latest/algorithms/reccomenders/recommender-overview.html">Reccomender Overview</a></li> Do we still need? seems like short version of next post--> + <!-- + <a class="dropdown-item" href="/docs/latest/algorithms/reccomenders/intro-cooccurrence-spark.html">Intro to Coocurrence With Spark</a></li> + <li role="separator" class="divider"></li> + <li><span> <a href="/docs/latest/algorithms/map-reduce"><b>MapReduce</b> (deprecated)</a><span></li> + + + --> + </li> + + <!-- Scala /docs --> + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">API /docs</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="/docs/latest/0.13.0/api/index.html">0.13.0</a> + </div> + </li> + + <!-- Apache --> + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Apache</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="http://www.apache.org/foundation/how-it-works.html">Apache Software Foundation</a> + <a class="dropdown-item" href="http://www.apache.org/licenses/">Apache License</a> + <a class="dropdown-item" href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a> + <a class="dropdown-item" href="http://www.apache.org/foundation/thanks.html">Thanks</a> + </div> + </li> - <div class="container"> - + </ul> + <!--<form class="navbar-form navbar-left">--> + <!--<div class="form-group">--> + <!--<input type="text" class="form-control" placeholder="Search">--> + <!--</div>--> + <!--<button type="submit" class="btn btn-default">Submit</button>--> + <!--</form>--> + <!--<ul class="nav navbar-nav navbar-right">--> + <!--<a class="dropdown-item" href="http://github.com/apache/mahout">Github</a></li>--> -<div class="row"> - <div class="col-xs-3"> - <div id="TutorialMenu"> - <span><b>Tutorials</b></span> - <div class="list-group panel"> - <a href="#linalg" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#TutorialMenu"><b>Linear Algebra</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="linalg"> - <ul class="nav sidebar-nav"> - <li><a href="/tutorials/eigenfaces">Eigenfaces Demo (Shell or Zeppelin)</a></li> - </ul> - </div> - <a href="#reccomenders" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#TutorialMenu"><b>Reccomenders</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="reccomenders"> - <ul class="nav sidebar-nav"> - <li><a href="/tutorials/cco-lastfm">CCO Example with Last.FM Data</a></li> - <li><a href="/tutorials/intro-cooccurrence-spark">Introduction to Cooccurrence in Spark</a></li> - </ul> - </div> - <a href="#other" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#TutorialMenu"><b>Other</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="other"> - <ul class="nav sidebar-nav"> - <li><a href="/tutorials/misc/mahout-in-zeppelin">Mahout in Apache Zeppelin</a></li> - <li><a href="/tutorials/misc/contributing-algos">How To Contribute a New Algorithm</a></li> - <li><a href="/tutorials/misc/how-to-build-an-app.html">How To Build An App</a></li> - </ul> - </div> - </div> - <span><b>Map Reduce Tutorials</b> (deprecated)</span> - <div class="list-group panel"> - <a href="#classification" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#MrTutorialMenu"><b>Classification</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="classification"> - <ul class="nav sidebar-nav"> - <li> <a href="/tutorials/map-reduce/classification/bankmarketing-example.html">Bank Marketing Example</a></li> - <li> <a href="/tutorials/map-reduce/classification/breiman-example.html">Breiman Example</a></li> - <li> <a href="/tutorials/map-reduce/classification/twenty-newsgroups.html">Twenty Newsgroups Example</a></li> - <li> <a href="/tutorials/map-reduce/classification/wikipedia-classifier-example.html">Wikipedia Classifier Example</a></li> - <li> <a href="/tutorials/map-reduce/classification/parallel-frequent-pattern-mining.html">Parallel Frequent Pattern Mining</a></li> - </ul> - </div> - <a href="#clustering" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#MrTutorialMenu"><b>Clustering</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="clustering"> - <ul class="nav sidebar-nav"> - <li> <a href="/tutorials/map-reduce/clustering/20newsgroups.html">Twenty Newsgroups Example</a></li> - <li> <a href="/tutorials/map-reduce/clustering/canopy-commandline.html">Canopy Clustering from the Commandline</a></li> - <li> <a href="/tutorials/map-reduce/clustering/clustering-of-synthetic-control-data.html">Clustering of Synthetic Control Data</a></li> - <li> <a href="/tutorials/map-reduce/clustering/clustering-seinfeld-episodes.html">Clustering of Seinfeld Episodes</a></li> - <li> <a href="/tutorials/map-reduce/clustering/clusteringyourdata.html">Clustering Your Data</a></li> - <li> <a href="/tutorials/map-reduce/clustering/fuzzy-k-means-commandline.html">Fuzzy K-Means from the Commandline</a></li> - <li> <a href="/tutorials/map-reduce/clustering/k-means-commandline.html">K-Means from the Commandline</a></li> - <li> <a href="/tutorials/map-reduce/clustering/lda-commandline.html">LDA from the Commandline</a></li> - <li> <a href="/tutorials/map-reduce/clustering/viewing-results.html">Viewing Results</a></li> - <li> <a href="/tutorials/map-reduce/clustering/visualizing-sample-clusters.html">Visualizing Sample Clusters</a></li> - </ul> - </div> - <a href="#misc" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#MrTutorialMenu"><b>Miscelaneous</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="misc"> - <ul class="nav sidebar-nav"> - <li> <a href="/tutorials/map-reduce/misc/mr---map-reduce.html">MR Map-Reduce</a></li> - <li> <a href="/tutorials/map-reduce/misc/parallel-frequent-pattern-mining.html">Parallel Frequent Pattern Mining</a></li> - <li> <a href="/tutorials/map-reduce/misc/using-mahout-with-python-via-jpype.html">Using Mahout (Map Reduce) with Python via Jpype</a></li> - </ul> - </div> - </div> -</div> + + <!--</ul>--> + </div><!-- /.navbar-collapse --> </div> +</nav> + + <div class="container mt-5 pb-4"> - <div class="col-xs-8"> - <div class="page-header"> - <h1>Eigenfaces Demo </h1> - </div> - <p><em>Credit: <a href="https://rawkintrevo.org/2016/11/10/deep-magic-volume-3-eigenfaces/">original blog post by rawkintrevo</a>. This will be maintained through version changes, blog post will not.</em></p> + <div class="row"> + + <div class="col-lg-8"> + <p><em>Credit: <a href="https://rawkintrevo.org/2016/11/10/deep-magic-volume-3-eigenfaces/">original blog post by rawkintrevo</a>. This will be maintained through version changes, blog post will not.</em></p> <p><em>Eigenfaces</em> are an image equivelent(ish) to <em>eigenvectors</em> if you recall your high school linear algebra classes. If you donât recall: <a href="https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors">read wikipedia</a> otherwise, it is a set of âfacesâ that by a linear combination can be used to represent other faces.</p> @@ -307,123 +193,123 @@ a neural network would be deployed as a microservice, and then eigenfaces would <p>The first thing weâre going to do is collect a set of 13,232 face images (250x250 pixels) from the <a href="http://vis-www.cs.umass.edu/lfw/">Labeled Faces in the Wild</a> data set.</p> -<pre><code>cd /tmp +<div class="highlighter-rouge"><pre class="highlight"><code>cd /tmp mkdir eigenfaces wget http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz tar -xzf lfw-deepfunneled.tgz </code></pre> +</div> <h3 id="load-dependencies">Load dependencies</h3> -<pre><code>cd $MAHOUT_HOME/bin +<div class="highlighter-rouge"><pre class="highlight"><code>cd $MAHOUT_HOME/bin ./mahout spark-shell \ --packages com.sksamuel.scrimage:scrimage-core_2.10:2.1.0, \ com.sksamuel.scrimage:scrimage-io-extra_2.10:2.1.0, \ com.sksamuel.scrimage:scrimage-filters_2.10:2.1.0 </code></pre> +</div> <h3 id="create-a-drm-of-vectorized-images">Create a DRM of Vectorized Images</h3> -<pre><code class="language-scala">import com.sksamuel.scrimage._ -import com.sksamuel.scrimage.filter.GrayscaleFilter +<div class="language-scala highlighter-rouge"><pre class="highlight"><code><span class="k">import</span> <span class="nn">com.sksamuel.scrimage._</span> +<span class="k">import</span> <span class="nn">com.sksamuel.scrimage.filter.GrayscaleFilter</span> -val imagesRDD:DrmRdd[Int] = sc.binaryFiles("/tmp/lfw-deepfunneled/*/*", 500) - .map(o => new DenseVector( Image.apply(o._2.toArray) - .filter(GrayscaleFilter) - .pixels - .map(p => p.toInt.toDouble / 10000000)) ) - .zipWithIndex - .map(o => (o._2.toInt, o._1)) +<span class="k">val</span> <span class="n">imagesRDD</span><span class="k">:</span><span class="kt">DrmRdd</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">binaryFiles</span><span class="o">(</span><span class="s">"/tmp/lfw-deepfunneled/*/*"</span><span class="o">,</span> <span class="mi">500</span><span class="o">)</span> + <span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">o</span> <span class="k">=></span> <span class="k">new</span> <span class="nc">DenseVector</span><span class="o">(</span> <span class="nc">Image</span><span class="o">.</span><span class="n">apply</span><span class="o">(</span><span class="n">o</span><span class="o">.</span><span class="n">_2</span><span class="o">.</span><span class="n">toArray</span><span class="o">)</span> + <span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="nc">GrayscaleFilter</span><span class="o">)</span> + <span class="o">.</span><span class="n">pixels</span> + <span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">p</span> <span class="k">=></span> <span class="n">p</span><span class="o">.</span><span class="n">toInt</span><span class="o">.</span><span class="n">toDouble</span> <span class="o">/</span> <span class="mi">10000000</span><span class="o">))</span> <span class="o">)</span> + <span class="o">.</span><span class="n">zipWithIndex</span> + <span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">o</span> <span class="k">=></span> <span class="o">(</span><span class="n">o</span><span class="o">.</span><span class="n">_2</span><span class="o">.</span><span class="n">toInt</span><span class="o">,</span> <span class="n">o</span><span class="o">.</span><span class="n">_1</span><span class="o">))</span> -val imagesDRM = drmWrap(rdd= imagesRDD).par(min = 500).checkpoint() +<span class="k">val</span> <span class="n">imagesDRM</span> <span class="k">=</span> <span class="n">drmWrap</span><span class="o">(</span><span class="n">rdd</span><span class="k">=</span> <span class="n">imagesRDD</span><span class="o">).</span><span class="n">par</span><span class="o">(</span><span class="n">min</span> <span class="k">=</span> <span class="mi">500</span><span class="o">).</span><span class="n">checkpoint</span><span class="o">()</span> -println(s"Dataset: ${imagesDRM.nrow} images, ${imagesDRM.ncol} pixels per image") +<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Dataset: ${imagesDRM.nrow} images, ${imagesDRM.ncol} pixels per image"</span><span class="o">)</span> </code></pre> +</div> <h3 id="mean-center-the-images">Mean Center the Images</h3> -<pre><code class="language-scala">import org.apache.mahout.math.algorithms.preprocessing.MeanCenter +<div class="language-scala highlighter-rouge"><pre class="highlight"><code><span class="k">import</span> <span class="nn">org.apache.mahout.math.algorithms.preprocessing.MeanCenter</span> -val scaler: MeanCenterModel = new MeanCenter().fit(imagesDRM) +<span class="k">val</span> <span class="n">scaler</span><span class="k">:</span> <span class="kt">MeanCenterModel</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">MeanCenter</span><span class="o">().</span><span class="n">fit</span><span class="o">(</span><span class="n">imagesDRM</span><span class="o">)</span> -val centeredImages = scaler.transform(imagesDRM) +<span class="k">val</span> <span class="n">centeredImages</span> <span class="k">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">transform</span><span class="o">(</span><span class="n">imagesDRM</span><span class="o">)</span> </code></pre> +</div> <h3 id="calculate-the-eigenimages-via-ds-svd">Calculate the Eigenimages via DS-SVD</h3> -<pre><code class="language-scala">import org.apache.mahout.math._ -import decompositions._ -import drm._ +<div class="language-scala highlighter-rouge"><pre class="highlight"><code><span class="k">import</span> <span class="nn">org.apache.mahout.math._</span> +<span class="k">import</span> <span class="nn">decompositions._</span> +<span class="k">import</span> <span class="nn">drm._</span> -val(drmU, drmV, s) = dssvd(centeredImages, k= 20, p= 15, q = 0) +<span class="k">val</span><span class="o">(</span><span class="n">drmU</span><span class="o">,</span> <span class="n">drmV</span><span class="o">,</span> <span class="n">s</span><span class="o">)</span> <span class="k">=</span> <span class="n">dssvd</span><span class="o">(</span><span class="n">centeredImages</span><span class="o">,</span> <span class="n">k</span><span class="k">=</span> <span class="mi">20</span><span class="o">,</span> <span class="n">p</span><span class="k">=</span> <span class="mi">15</span><span class="o">,</span> <span class="n">q</span> <span class="k">=</span> <span class="mi">0</span><span class="o">)</span> </code></pre> +</div> <h3 id="write-the-eigenfaces-to-disk">Write the Eigenfaces to Disk</h3> -<pre><code class="language-scala">import java.io.File -import javax.imageio.ImageIO - -val sampleImagePath = "/home/guest/lfw-deepfunneled/Aaron_Eckhart/Aaron_Eckhart_0001.jpg" -val sampleImage = ImageIO.read(new File(sampleImagePath)) -val w = sampleImage.getWidth -val h = sampleImage.getHeight - -val eigenFaces = drmV.t.collect(::,::) -val colMeans = scaler.colCentersV - -for (i <- 0 until 20){ - val v = (eigenFaces(i, ::) + colMeans) * 10000000 - val output = new Array[com.sksamuel.scrimage.Pixel](v.size) - for (i <- 0 until v.size) { - output(i) = Pixel(v.get(i).toInt) - } - val image = Image(w, h, output) - image.output(new File(s"/tmp/eigenfaces/${i}.png")) -} +<div class="language-scala highlighter-rouge"><pre class="highlight"><code><span class="k">import</span> <span class="nn">java.io.File</span> +<span class="k">import</span> <span class="nn">javax.imageio.ImageIO</span> + +<span class="k">val</span> <span class="n">sampleImagePath</span> <span class="k">=</span> <span class="s">"/home/guest/lfw-deepfunneled/Aaron_Eckhart/Aaron_Eckhart_0001.jpg"</span> +<span class="k">val</span> <span class="n">sampleImage</span> <span class="k">=</span> <span class="nc">ImageIO</span><span class="o">.</span><span class="n">read</span><span class="o">(</span><span class="k">new</span> <span class="nc">File</span><span class="o">(</span><span class="n">sampleImagePath</span><span class="o">))</span> +<span class="k">val</span> <span class="n">w</span> <span class="k">=</span> <span class="n">sampleImage</span><span class="o">.</span><span class="n">getWidth</span> +<span class="k">val</span> <span class="n">h</span> <span class="k">=</span> <span class="n">sampleImage</span><span class="o">.</span><span class="n">getHeight</span> + +<span class="k">val</span> <span class="n">eigenFaces</span> <span class="k">=</span> <span class="n">drmV</span><span class="o">.</span><span class="n">t</span><span class="o">.</span><span class="n">collect</span><span class="o">(::,::)</span> +<span class="k">val</span> <span class="n">colMeans</span> <span class="k">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">colCentersV</span> + +<span class="k">for</span> <span class="o">(</span><span class="n">i</span> <span class="k"><-</span> <span class="mi">0</span> <span class="n">until</span> <span class="mi">20</span><span class="o">){</span> + <span class="k">val</span> <span class="n">v</span> <span class="k">=</span> <span class="o">(</span><span class="n">eigenFaces</span><span class="o">(</span><span class="n">i</span><span class="o">,</span> <span class="o">::)</span> <span class="o">+</span> <span class="n">colMeans</span><span class="o">)</span> <span class="o">*</span> <span class="mi">10000000</span> + <span class="k">val</span> <span class="n">output</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Array</span><span class="o">[</span><span class="kt">com.sksamuel.scrimage.Pixel</span><span class="o">](</span><span class="n">v</span><span class="o">.</span><span class="n">size</span><span class="o">)</span> + <span class="k">for</span> <span class="o">(</span><span class="n">i</span> <span class="k"><-</span> <span class="mi">0</span> <span class="n">until</span> <span class="n">v</span><span class="o">.</span><span class="n">size</span><span class="o">)</span> <span class="o">{</span> + <span class="n">output</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="nc">Pixel</span><span class="o">(</span><span class="n">v</span><span class="o">.</span><span class="n">get</span><span class="o">(</span><span class="n">i</span><span class="o">).</span><span class="n">toInt</span><span class="o">)</span> + <span class="o">}</span> + <span class="k">val</span> <span class="n">image</span> <span class="k">=</span> <span class="nc">Image</span><span class="o">(</span><span class="n">w</span><span class="o">,</span> <span class="n">h</span><span class="o">,</span> <span class="n">output</span><span class="o">)</span> + <span class="n">image</span><span class="o">.</span><span class="n">output</span><span class="o">(</span><span class="k">new</span> <span class="nc">File</span><span class="o">(</span><span class="n">s</span><span class="s">"/tmp/eigenfaces/${i}.png"</span><span class="o">))</span> +<span class="o">}</span> </code></pre> +</div> <h3 id="view-the-eigenfaces">View the Eigenfaces</h3> <p>If using Zeppelin, the following can be used to generate a fun table of the Eigenfaces:</p> -<pre><code class="language-python">%python +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="o">%</span><span class="n">python</span> -r = 4 -c = 5 -print '%html\n<table style="width:100%">' + "".join(["<tr>" + "".join([ '<td><img src="/tmp/eigenfaces/%i.png"></td>' % (i + j) for j in range(0, c) ]) + "</tr>" for i in range(0, r * c, r +1 ) ]) + '</table>' +<span class="n">r</span> <span class="o">=</span> <span class="mi">4</span> +<span class="n">c</span> <span class="o">=</span> <span class="mi">5</span> +<span class="k">print</span> <span class="s">'</span><span class="si">%</span><span class="s">html</span><span class="se">\n</span><span class="s"><table style="width:100</span><span class="si">%</span><span class="s">">'</span> <span class="o">+</span> <span class="s">""</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="s">"<tr>"</span> <span class="o">+</span> <span class="s">""</span><span class="o">.</span><span class="n">join</span><span class="p">([</span> <span class="s">'<td><img src="/tmp/eigenfaces/</span><span class="si">%</span><span class="s">i.png"></td>'</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span cla ss="n">c</span><span class="p">)</span> <span class="p">])</span> <span class="o">+</span> <span class="s">"</tr>"</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">r</span> <span class="o">*</span> <span class="n">c</span><span class="p">,</span> <span class="n">r</span> <span class="o">+</span><span class="mi">1</span> <span class="p">)</span> <span class="p">])</span> <span class="o">+</span> <span class="s">'</table>'</span> </code></pre> +</div> <p><img src="eigenfaces.png" alt="Eigenfaces" /></p> </div> -</div> - </div> - - -</div> -<div id="footer"> - <div class="container"> - <p>© 2017 Apache Mahout - with help from <a href="http://jekyllbootstrap.com" target="_blank" title="The Definitive Jekyll Blogging Framework">Jekyll Bootstrap</a> - and <a href="http://getbootstrap.com" target="_blank">Bootstrap</a> - </p> </div> -</div> - - +</div> + <footer class="footer bg-light"> + <div class="container text-center small"> + Copyright © 2014-2017 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. + </div> +</footer> + <script src="/assets/vendor/jquery/jquery-slim.min.js"></script> + <script src="/assets/vendor/popper/popper.min.js"></script> + <script src="/assets/vendor/bootstrap/js/bootstrap.min.js"></script> + <script src="/assets/header.js"></script> + <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> -<!-- Latest compiled and minified JavaScript, requires jQuery 1.x (2.x not supported in IE8) --> -<!-- Placed at the end of the document so the pages load faster --> -<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script> -<script src="/assets/themes/mahout3/js/bootstrap.min.js"></script> </body> -</html> +</html>
http://git-wip-us.apache.org/repos/asf/mahout/blob/5112e9ec/docs/latest/tutorials/intro-cooccurrence-spark/index.html ---------------------------------------------------------------------- diff --git a/docs/latest/tutorials/intro-cooccurrence-spark/index.html b/docs/latest/tutorials/intro-cooccurrence-spark/index.html index b1fe1ae..e57c16b 100644 --- a/docs/latest/tutorials/intro-cooccurrence-spark/index.html +++ b/docs/latest/tutorials/intro-cooccurrence-spark/index.html @@ -1,312 +1,169 @@ - - <!DOCTYPE html> -<html lang="en"> +<html lang=" en "> + <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> - <title>Intro to Cooccurrence Recommenders with Spark</title> - - <meta name="author" content="Apache Mahout"> - - <!-- Enable responsive viewport --> - <meta name="viewport" content="width=device-width, initial-scale=1.0"> - - <!-- Bootstrap styles --> - <link href="/assets/themes/mahout3/css/bootstrap.min.css" rel="stylesheet"> - <!-- Optional theme --> - <link href="/assets/themes/mahout3/css/bootstrap-theme.min.css" rel="stylesheet"> - <!-- Sticky Footer --> - <link href="/assets/themes/mahout3/css/bs-sticky-footer.css" rel="stylesheet"> - - <!-- Custom styles --> - <link href="/assets/themes/mahout3/css/style.css" rel="stylesheet" type="text/css" media="all"> - - <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries --> - <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> - <!--[if lt IE 9]> - <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script> - <script src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js"></script> - <![endif]--> - - <!-- Fav and touch icons --> - <!-- Update these with your own images - <link rel="shortcut icon" href="images/favicon.ico"> - <link rel="apple-touch-icon" href="images/apple-touch-icon.png"> - <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png"> - <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png"> - --> - - <!-- atom & rss feed --> - <link href="/atom.xml" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed"> - <link href="/rss.xml" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed"> - <script type="text/x-mathjax-config"> - MathJax.Hub.Config({ - tex2jax: { - skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] - } - }); - MathJax.Hub.Queue(function() { - var all = MathJax.Hub.getAllJax(), i; - for(i = 0; i < all.length; i += 1) { - all[i].SourceElement().parentNode.className += ' has-jax'; - } - }); - </script> - <script type="text/javascript"> - var mathjax = document.createElement('script'); - mathjax.type = 'text/javascript'; - mathjax.async = true; - - mathjax.src = ('https:' == document.location.protocol) ? - 'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : - 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; - - var s = document.getElementsByTagName('script')[0]; - s.parentNode.insertBefore(mathjax, s); - </script> -</head> - -<nav class="navbar navbar-default navbar-fixed-top"> - <div class="container-fluid"> - <!-- Brand and toggle get grouped for better mobile display --> - <div class="navbar-header"> - <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false"> - <span class="sr-only">Toggle navigation</span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - <span class="icon-bar"></span> - </button> - <a class="navbar-brand" href="/"> - <img src="/assets/img/Mahout-logo-82x100.png" height="30" alt="I'm mahout"> - </a> - </div> - + <title> + Intro to Cooccurrence Recommenders with Spark + </title> + <meta name="description" content="Distributed Linear Algebra"> -<!-- Collect the nav links, forms, and other content for toggling --> -<div class="collapse navbar-collapse" id="main-navbar"> - <ul class="nav navbar-nav"> - - <!-- Quick Start --> - <li id="quickstart"> - <a href="/index.html" >Mahout Overview</a> - </li> - - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Key Concepts<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/index.html">Mahout Overview</a></li> - <li><span><b> Scala DSL</b><span></li> - <li><a href="/mahout-samsara/in-core-reference.html">In-core Reference</a></li> - <li><a href="/mahout-samsara/out-of-core-reference.html">Out-of-core Reference</a></li> - <li><a href="/mahout-samsara/faq.html">Samsara FAQ</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Bindings</b><span></li> - <li><a href="/distributed/spark-bindings/">Spark Bindings</a></li> - <li><a href="/distributed/flink-bindings.html">Flink Bindings</a></li> - <li><a href="/distributed/flink-bindings.html">H20 Bindings</a></li> - <!--<li role="separator" class="divider"></li> - <li><span> <b>Native Solvers</b><span></li> - <li><a href="/native-solvers/viennacl.html">ViennaCL</a></li> - <li><a href="/native-solvers/viennacl-omp.html">ViennaCL-OMP</a></li> - <li><a href="/native-solvers/cuda.html">CUDA</a></li>--> - </ul> - </li> - - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Tutorials<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><span> <b>Reccomenders</b><span></li> - <li><a href="/tutorials/cco-lastfm">CCO Example with Last.FM Data</a></li> - <li><a href="/tutorials/intro-cooccurrence-spark">Introduction to Cooccurrence in Spark</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Mahout Samsara</b><span></li> - <li><a href="/tutorials/samsara/play-with-shell.html">Playing with Samsara in Spark Shell</a></li> - <li><a href="/tutorials/samsara/playing-with-samsara-flink-batch.html">Playing with Samsara in Flink Batch</a></li> - <li><a href="/tutorials/samsara/classify-a-doc-from-the-shell.html">Text Classification (Shell)</a></li> - <li><a href="/tutorials/samsara/spark-naive-bayes.html">Spark Naive Bayes</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Misc</b><span></li> - <li><a href="/tutorials/misc/mahout-in-zeppelin">Mahout in Apache Zeppelin</a></li> - <li><a href="/tutorials/misc/contributing-algos">How To Contribute a New Algorithm</a></li> - <li><a href="/tutorials/misc/how-to-build-an-app.html">How To Build An App</a></li> - <li role="separator" class="divider"></li> - <li><span> <b>Deprecated</b><span></li> - <li><a href="/tutorials/map-reduce">MapReduce</a></li> - </ul> - </li> - - - <!-- Algorithms (Samsara / MR) --> - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Algorithms<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/algorithms/linear-algebra">Distributed Linear Algebra</a></li> - <li><a href="/algorithms/preprocessors">Preprocessors</a></li> - <li><a href="/algorithms/regression">Regression</a></li> - <li><a href="/algorithms/reccomenders">Reccomenders</a></li> - <li role="separator" class="divider"></li> - <li><a href="/algorithms/map-reduce">MapReduce <i>(deprecated)</i></a></li> - </ul> - <!--<li><a href="/algorithms/reccomenders/recommender-overview.html">Reccomender Overview</a></li> Do we still need? seems like short version of next post--> - <!-- - <li><a href="/algorithms/reccomenders/intro-cooccurrence-spark.html">Intro to Coocurrence With Spark</a></li> - <li role="separator" class="divider"></li> - <li><span> <a href="/algorithms/map-reduce"><b>MapReduce</b> (deprecated)</a><span></li> + <link rel="stylesheet" href="/assets/css/main.css"> + <!-- Font Awesome --> + <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN" crossorigin="anonymous"> - --> - </li> + <!-- Google Fonts --> + <link href="https://fonts.googleapis.com/css?family=Maven+Pro:400,500" rel="stylesheet"> + <link href="https://fonts.googleapis.com/css?family=Muli:400,400i,700,700i" rel="stylesheet"> - <!-- Scala Docs --> - <li id="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">API Docs<span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="/0.13.0/api/index.html">0.13.0</a></li> - </ul> - </li> - - - </ul> - <form class="navbar-form navbar-left"> - <div class="form-group"> - <input type="text" class="form-control" placeholder="Search"> - </div> - <button type="submit" class="btn btn-default">Submit</button> - </form> - <ul class="nav navbar-nav navbar-right"> - <li><a href="http://github.com/apache/mahout">Github</a></li> - - <!-- Apache --> - <li class="dropdown"> - <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache <span class="caret"></span></a> - <ul class="dropdown-menu"> - <li><a href="http://www.apache.org/foundation/how-it-works.html">Apache Software Foundation</a></li> - <li><a href="http://www.apache.org/licenses/">Apache License</a></li> - <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> - <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> - </ul> - </li> + <link rel="canonical" href="http://mahout.apache.org//docs/latest/tutorials/intro-cooccurrence-spark/"> + <link rel="alternate" type="application/rss+xml" title="Apache Mahout" href="/%20/feed.xml"> - </ul> -</div><!-- /.navbar-collapse --> - </div><!-- /.container-fluid --> -</nav> +</head> + <body> -<div id="wrap"> - <body class=""> + <nav class="navbar navbar-expand-lg navbar-light bg-light navbar-mahout"> + + <div class="container"> + + <a class="navbar-brand" href="/"> + <img src="/assets/mahout-logo-blue.svg" alt=""> + </a> + + <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation"> + <span class="navbar-toggler-icon"></span> + </button> + + <div class="collapse navbar-collapse" id="navbarSupportedContent"> + + <div class="navbar-nav ml-auto"> + + <!-- Quick Start --> + <li class="nav-item"> + <a class="nav-link" href="//docs/latest/" >Mahout Overview</a> + </li> + + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Key Concepts</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="/docs/latest/index.html">Mahout Overview</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Scala DSL</h6> + <a class="dropdown-item" href="/docs/latest/mahout-samsara/in-core-reference.html">In-core Reference</a> + <a class="dropdown-item" href="/docs/latest/mahout-samsara/out-of-core-reference.html">Out-of-core Reference</a> + <a class="dropdown-item" href="/docs/latest/mahout-samsara/faq.html">Samsara FAQ</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Distributed Engine Bindings</h6> + <a class="dropdown-item" href="/docs/latest/distributed/spark-bindings/">Spark Bindings</a> + <a class="dropdown-item" href="/docs/latest/distributed/flink-bindings.html">Flink Bindings</a> + <a class="dropdown-item" href="/docs/latest/distributed/flink-bindings.html">H20 Bindings</a> + <!--<div class="dropdown-divider"></div> + <h6 class="dropdown-header">Native Solvers</h6> + <a class="dropdown-item" href="/docs/latest/native-solvers/viennacl.html">ViennaCL</a></li> + <a class="dropdown-item" href="/docs/latest/native-solvers/viennacl-omp.html">ViennaCL-OMP</a></li> + <a class="dropdown-item" href="/docs/latest/native-solvers/cuda.html">CUDA</a></li>--> + </div> + </li> + + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Tutorial</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Reccomenders</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/cco-lastfm">CCO Example with Last.FM Data</a> + <a class="dropdown-item" href="/docs/latest/tutorials/intro-cooccurrence-spark">Introduction to Cooccurrence in Spark</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Mahout Samsara</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/play-with-shell.html">Playing with Samsara in Spark Shell</a> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/playing-with-samsara-flink-batch.html">Playing with Samsara in Flink Batch</a> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/classify-a-doc-from-the-shell.html">Text Classification (Shell)</a> + <a class="dropdown-item" href="/docs/latest/tutorials/samsara/spark-naive-bayes.html">Spark Naive Bayes</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Misc</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/misc/mahout-in-zeppelin">Mahout in Apache Zeppelin</a> + <a class="dropdown-item" href="/docs/latest/tutorials/misc/contributing-algos">How To Contribute a New Algorithm</a> + <a class="dropdown-item" href="/docs/latest/tutorials/misc/how-to-build-an-app.html">How To Build An App</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Deprecated</h6> + <a class="dropdown-item" href="/docs/latest/tutorials/map-reduce">MapReduce</a> + </div> + </li> + + + <!-- Algorithms (Samsara / MR) --> + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Algorithms</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="/docs/latest/algorithms/linear-algebra">Distributed Linear Algebra</a> + <a class="dropdown-item" href="/docs/latest/algorithms/preprocessors">Preprocessors</a> + <a class="dropdown-item" href="/docs/latest/algorithms/regression">Regression</a> + <a class="dropdown-item" href="/docs/latest/algorithms/reccomenders">Reccomenders</a> + <div class="dropdown-divider"></div> + <h6 class="dropdown-header">Deprecated</h6> + <a class="dropdown-item" href="/docs/latest/algorithms/map-reduce">MapReduce <i>(deprecated)</i></a> + </div> + <!--<a class="dropdown-item" href="/docs/latest/algorithms/reccomenders/recommender-overview.html">Reccomender Overview</a></li> Do we still need? seems like short version of next post--> + <!-- + <a class="dropdown-item" href="/docs/latest/algorithms/reccomenders/intro-cooccurrence-spark.html">Intro to Coocurrence With Spark</a></li> + <li role="separator" class="divider"></li> + <li><span> <a href="/docs/latest/algorithms/map-reduce"><b>MapReduce</b> (deprecated)</a><span></li> + + + --> + </li> + + <!-- Scala /docs --> + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">API /docs</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="/docs/latest/0.13.0/api/index.html">0.13.0</a> + </div> + </li> + + <!-- Apache --> + <li class="nav-item dropdown"> + <a class="nav-link dropdown-toggle" href="" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Apache</a> + <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> + <a class="dropdown-item" href="http://www.apache.org/foundation/how-it-works.html">Apache Software Foundation</a> + <a class="dropdown-item" href="http://www.apache.org/licenses/">Apache License</a> + <a class="dropdown-item" href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a> + <a class="dropdown-item" href="http://www.apache.org/foundation/thanks.html">Thanks</a> + </div> + </li> - <div class="container"> - + </ul> + + <!--<form class="navbar-form navbar-left">--> + <!--<div class="form-group">--> + <!--<input type="text" class="form-control" placeholder="Search">--> + <!--</div>--> + <!--<button type="submit" class="btn btn-default">Submit</button>--> + <!--</form>--> + <!--<ul class="nav navbar-nav navbar-right">--> + <!--<a class="dropdown-item" href="http://github.com/apache/mahout">Github</a></li>--> -<div class="row"> - <div class="col-md-3"> - <div id="AlgoMenu"> - <span><b>Mahout-Samsara Algorithms</b></span> - <div class="list-group panel"> - <a href="#linalg" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Linear Algebra</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="linalg"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/linear-algebra/d-qr.html">Distributed QR Decomposition</a></li> - <li> <a href="/algorithms/linear-algebra/d-spca.html">Distributed Stochastic Principal Component Analysis</a></li> - <li> <a href="/algorithms/linear-algebra/d-ssvd.html">Distributed Stochastic Singular Value Decomposition</a></li> - </ul> - </div> - <a href="#clustering" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Clustering</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="clustering"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/clustering">Clustering Algorithms</a></li> - <li> <a href="/algorithms/clustering/distance-metrics.html">Distance Metrics</a></li> - <li> <a href="/algorithms/clustering/canopy">Canopy Clustering</a></li> - </ul> - </div> - <a href="#preprocessors" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Preprocessors</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="preprocessors"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/preprocessors/AsFactor.html">AsFactor (a.k.a. One-Hot-Encoding)</a></li> - <li> <a href="/algorithms/preprocessors/StandardScaler.html">StandardScaler</a></li> - <li> <a href="/algorithms/preprocessors/MeanCenter.html">MeanCenter</a></li> - </ul> - </div> - <a href="#regression" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Regression</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="regression"> - <ul class="nav sidebar-nav"> - <a href="#serial-correlation" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#regression"><b>• Serial Correlation</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="serial-correlation"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/regression/serial-correlation/cochrane-orcutt.html">Cochrane-Orcutt Procedure</a></li> - <li> <a href="/algorithms/regression/serial-correlation/dw-test.html">Durbin Watson Test</a></li> - </ul> - </div> - <li> <a href="/algorithms/regression/ols.html">Ordinary Least Squares (Closed Form)</a></li> - <li> <a href="/algorithms/regression/fittness-tests.html">Fitness Tests</a></li> - </ul> - </div> - <a href="#reccomenders" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Reccomenders</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="reccomenders"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/reccomenders">Reccomender Overview</a></li> - <li> <a href="/algorithms/reccomenders/cco.html">CCO</a></li> - <li> <a href="/algorithms/reccomenders/d-als.html">Distributed Alternating Least Squares</a></li> - </ul> - </div> - </div> - <span><b>Map Reduce Algorithms</b> (deprecated)</span> - <div class="list-group panel"> - <a href="#classification" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Classification</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="classification"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/map-reduce/classification/bayesian.html">Bayesian</a></li> - <li> <a href="/algorithms/map-reduce/classification/class-discovery.html">Class Discovery</a></li> - <li> <a href="/algorithms/map-reduce/classification/classifyingyourdata.html">Classifying Your Data</a></li> - <li> <a href="/algorithms/map-reduce/classification/collocations.html">Collocation</a></li> - <li> <a href="/algorithms/map-reduce/classification/gaussian-discriminative-analysis.html">Gaussian Discriminative Analysis</a></li> - <li> <a href="/algorithms/map-reduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li> - <li> <a href="/algorithms/map-reduce/classification/independent-component-analysis.html">Independent Component Analysis</a></li> - <li> <a href="/algorithms/map-reduce/classification/locally-weighted-linear-regression.html">Locally Weighted Linear Regression</a></li> - <li> <a href="/algorithms/map-reduce/classification/logistic-regression.html">Logistic Regression</a></li> - <li> <a href="/algorithms/map-reduce/classification/mahout-collections.html">Mahout Collections</a></li> - <li> <a href="/algorithms/map-reduce/classification/mlp.html">Multilayer Perceptron</a></li> - <li> <a href="/algorithms/map-reduce/classification/naivebayes.html">Naive Bayes</a></li> - <li> <a href="/algorithms/map-reduce/classification/neural-network.html">Neural Networks</a></li> - <li> <a href="/algorithms/map-reduce/classification/partial-implementation.html">Partial Implementation</a></li> - <li> <a href="/algorithms/map-reduce/classification/random-forrests.html">Random Forrests</a></li> - <li> <a href="/algorithms/map-reduce/classification/restricted-boltzman-machines.html">Restricted Boltzman Machines</a></li> - <li> <a href="/algorithms/map-reduce/classification/support-vector-machines.html">Support Vector Machines</a></li> - </ul> - </div> - <a href="#mr-clustering" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#AlgoMenu"><b>Clustering</b><i class="fa fa-caret-down"></i></a> - <div class="collapse" id="mr-clustering"> - <ul class="nav sidebar-nav"> - <li> <a href="/algorithms/map-reduce/clustering/canopy-clustering.html">Canopy Clustering</a></li> - <li> <a href="/algorithms/map-reduce/clustering/cluster-dumper.html">Cluster Dumper</a></li> - <li> <a href="/algorithms/map-reduce/clustering/expectation-maximization.html">Expectation Maximization</a></li> - <li> <a href="/algorithms/map-reduce/clustering/fuzzy-k-means.html">Fuzzy K-Means</a></li> - <li> <a href="/algorithms/map-reduce/clustering/hierarchical-clustering.html">Hierarchical Clustering</a></li> - <li> <a href="/algorithms/map-reduce/clustering/k-means-clustering.html">K-Means Clustering</a></li> - <li> <a href="/algorithms/map-reduce/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li> - <li> <a href="/algorithms/map-reduce/clustering/llr---log-likelihood-ratio.html">Log Likelihood Ratio</a></li> - <li> <a href="/algorithms/map-reduce/clustering/spectral-clustering.html">Spectral Clustering</a></li> - <li> <a href="/algorithms/map-reduce/clustering/streaming-k-means.html">Streaming K-Means</a></li> - </ul> - </div> - </div> -</div> + <!--</ul>--> + </div><!-- /.navbar-collapse --> </div> +</nav> + + <div class="container mt-5 pb-4"> + + <div class="row"> - <div class="col-md-8"> - <div class="page-header"> - <h1>Intro to Cooccurrence Recommenders with Spark </h1> - </div> - <h1 id="intro-to-cooccurrence-recommenders-with-spark">Intro to Cooccurrence Recommenders with Spark</h1> + <div class="col-lg-8"> + <h1 id="intro-to-cooccurrence-recommenders-with-spark">Intro to Cooccurrence Recommenders with Spark</h1> <p>Mahout provides several important building blocks for creating recommendations using Spark. <em>spark-itemsimilarity</em> can be used to create âother people also liked these thingsâ type recommendations and paired with a search engine can @@ -346,7 +203,7 @@ For instance they might say an item-view is 0.2 of an item purchase. In practice cross-cooccurrence is a more principled way to handle this case. In effect it scrubs secondary actions with the action you want to recommend.</p> -<pre><code>spark-itemsimilarity Mahout 1.0 +<div class="highlighter-rouge"><pre class="highlight"><code>spark-itemsimilarity Mahout 1.0 Usage: spark-itemsimilarity [options] Disconnected from the target VM, address: '127.0.0.1:64676', transport: 'socket' @@ -411,6 +268,7 @@ Spark config options: -h | --help prints this usage text </code></pre> +</div> <p>This looks daunting but defaults to simple fairly sane values to take exactly the same input as legacy code and is pretty flexible. It allows the user to point to a single text file, a directory full of files, or a tree of directories to be traversed recursively. The files included can be specified with either a regex-style pattern or filename. The schema for the file is defined by column numbers, which map to the important bits of data including IDs and values. The files can even contain filters, which allow unneeded rows to be discarded or used for cross-cooccurrence calculations.</p> @@ -420,20 +278,23 @@ Spark config options: <p>If all defaults are used the input can be as simple as:</p> -<pre><code>userID1,itemID1 +<div class="highlighter-rouge"><pre class="highlight"><code>userID1,itemID1 userID2,itemID2 ... </code></pre> +</div> <p>With the command line:</p> -<pre><code>bash$ mahout spark-itemsimilarity --input in-file --output out-dir +<div class="highlighter-rouge"><pre class="highlight"><code>bash$ mahout spark-itemsimilarity --input in-file --output out-dir </code></pre> +</div> <p>This will use the âlocalâ Spark context and will output the standard text version of a DRM</p> -<pre><code>itemID1<tab>itemID2:value2<space>itemID10:value10... +<div class="highlighter-rouge"><pre class="highlight"><code>itemID1<tab>itemID2:value2<space>itemID10:value10... </code></pre> +</div> <h3 id="how-to-use-multiple-user-actions"><a name="multiple-actions">How To Use Multiple User Actions</a></h3> @@ -449,7 +310,7 @@ to calculate the cross-cooccurrence indicator matrix.</p> <p><em>spark-itemsimilarity</em> can read separate actions from separate files or from a mixed action log by filtering certain lines. For a mixed action log of the form:</p> -<pre><code>u1,purchase,iphone +<div class="highlighter-rouge"><pre class="highlight"><code>u1,purchase,iphone u1,purchase,ipad u2,purchase,nexus u2,purchase,galaxy @@ -470,12 +331,13 @@ u4,view,iphone u4,view,ipad u4,view,galaxy </code></pre> +</div> <p>###Command Line</p> <p>Use the following options:</p> -<pre><code>bash$ mahout spark-itemsimilarity \ +<div class="highlighter-rouge"><pre class="highlight"><code>bash$ mahout spark-itemsimilarity \ --input in-file \ # where to look for data --output out-path \ # root dir for output --master masterUrl \ # URL of the Spark master server @@ -485,34 +347,38 @@ u4,view,galaxy --rowIDPosition 0 \ # column that has the user ID --filterPosition 1 # column that has the filter word </code></pre> +</div> <h3 id="output">Output</h3> <p>The output of the job will be the standard text version of two Mahout DRMs. This is a case where we are calculating cross-cooccurrence so a primary indicator matrix and cross-cooccurrence indicator matrix will be created</p> -<pre><code>out-path +<div class="highlighter-rouge"><pre class="highlight"><code>out-path |-- similarity-matrix - TDF part files \-- cross-similarity-matrix - TDF part-files </code></pre> +</div> <p>The similarity-matrix will contain the lines:</p> -<pre><code>galaxy\tnexus:1.7260924347106847 +<div class="highlighter-rouge"><pre class="highlight"><code>galaxy\tnexus:1.7260924347106847 ipad\tiphone:1.7260924347106847 nexus\tgalaxy:1.7260924347106847 iphone\tipad:1.7260924347106847 surface </code></pre> +</div> <p>The cross-similarity-matrix will contain:</p> -<pre><code>iphone\tnexus:1.7260924347106847 iphone:1.7260924347106847 ipad:1.7260924347106847 galaxy:1.7260924347106847 +<div class="highlighter-rouge"><pre class="highlight"><code>iphone\tnexus:1.7260924347106847 iphone:1.7260924347106847 ipad:1.7260924347106847 galaxy:1.7260924347106847 ipad\tnexus:0.6795961471815897 iphone:0.6795961471815897 ipad:0.6795961471815897 galaxy:0.6795961471815897 nexus\tnexus:0.6795961471815897 iphone:0.6795961471815897 ipad:0.6795961471815897 galaxy:0.6795961471815897 galaxy\tnexus:1.7260924347106847 iphone:1.7260924347106847 ipad:1.7260924347106847 galaxy:1.7260924347106847 surface\tsurface:4.498681156950466 nexus:0.6795961471815897 </code></pre> +</div> <p><strong>Note:</strong> You can run this multiple times to use more than two actions or you can use the underlying SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any number of cross-cooccurrence indicators.</p> @@ -521,7 +387,7 @@ SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any n <p>A common method of storing data is in log files. If they are written using some delimiter they can be consumed directly by spark-itemsimilarity. For instance input of the form:</p> -<pre><code>2014-06-23 14:46:53.115\tu1\tpurchase\trandom text\tiphone +<div class="highlighter-rouge"><pre class="highlight"><code>2014-06-23 14:46:53.115\tu1\tpurchase\trandom text\tiphone 2014-06-23 14:46:53.115\tu1\tpurchase\trandom text\tipad 2014-06-23 14:46:53.115\tu2\tpurchase\trandom text\tnexus 2014-06-23 14:46:53.115\tu2\tpurchase\trandom text\tgalaxy @@ -542,10 +408,11 @@ SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any n 2014-06-23 14:46:53.115\tu4\tview\trandom text\tipad 2014-06-23 14:46:53.115\tu4\tview\trandom text\tgalaxy </code></pre> +</div> <p>Can be parsed with the following CLI and run on the cluster producing the same output as the above example.</p> -<pre><code>bash$ mahout spark-itemsimilarity \ +<div class="highlighter-rouge"><pre class="highlight"><code>bash$ mahout spark-itemsimilarity \ --input in-file \ --output out-path \ --master spark://sparkmaster:4044 \ @@ -556,6 +423,7 @@ SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any n --rowIDPosition 1 \ --filterPosition 2 </code></pre> +</div> <h2 id="2-spark-rowsimilarity">2. spark-rowsimilarity</h2> @@ -570,7 +438,7 @@ by a list of the most similar rows.</p> <p>The command line interface is:</p> -<pre><code>spark-rowsimilarity Mahout 1.0 +<div class="highlighter-rouge"><pre class="highlight"><code>spark-rowsimilarity Mahout 1.0 Usage: spark-rowsimilarity [options] Input, output options @@ -617,6 +485,7 @@ Spark config options: -h | --help prints this usage text </code></pre> +</div> <p>See RowSimilarityDriver.scala in Mahoutâs spark module if you want to customize the code.</p> @@ -698,32 +567,36 @@ content or metadata, not by which users interacted with them.</p> <p>For this we need input of the form:</p> -<pre><code>itemID<tab>list-of-tags +<div class="highlighter-rouge"><pre class="highlight"><code>itemID<tab>list-of-tags ... </code></pre> +</div> <p>The full collection will look like the tags column from a catalog DB. For our ecom example it might be:</p> -<pre><code>3459860b<tab>men long-sleeve chambray clothing casual +<div class="highlighter-rouge"><pre class="highlight"><code>3459860b<tab>men long-sleeve chambray clothing casual 9446577d<tab>women tops chambray clothing casual ... </code></pre> +</div> <p>Weâll use <em>spark-rowimilairity</em> because we are looking for similar rows, which encode items in this case. As with the collaborative filtering indicators we use the âomitStrength option. The strengths created are probabilistic log-likelihood ratios and so are used to filter unimportant similarities. Once the filtering or downsampling is finished we no longer need the strengths. We will get an indicator matrix of the form:</p> -<pre><code>itemID<tab>list-of-item IDs +<div class="highlighter-rouge"><pre class="highlight"><code>itemID<tab>list-of-item IDs ... </code></pre> +</div> <p>This is a content indicator since it has found other items with similar content or metadata.</p> -<pre><code>3459860b<tab>3459860b 3459860b 6749860c 5959860a 3434860a 3477860a +<div class="highlighter-rouge"><pre class="highlight"><code>3459860b<tab>3459860b 3459860b 6749860c 5959860a 3434860a 3477860a 9446577d<tab>9446577d 9496577d 0943577d 8346577d 9442277d 9446577e ... </code></pre> +</div> <p>We now have three indicators, two collaborative filtering type and one content type.</p> @@ -734,11 +607,12 @@ is finished we no longer need the strengths. We will get an indicator matrix of <p>We have 3 indicators, these are indexed by the search engine into 3 fields, weâll call them âpurchaseâ, âviewâ, and âtagsâ. We take the userâs history that corresponds to each indicator and create a query of the form:</p> -<pre><code>Query: +<div class="highlighter-rouge"><pre class="highlight"><code>Query: field: purchase; q:user's-purchase-history field: view; q:user's view-history field: tags; q:user's-tags-associated-with-purchases </code></pre> +</div> <p>The query will result in an ordered list of items recommended for purchase but skewed towards items with similar tags to the ones the user has already purchased.</p> @@ -750,11 +624,12 @@ by tagging items with some category of popularity (hot, warm, cold for instance) index that as a new indicator field and include the corresponding value in a query on the popularity field. If we use the ecom example but use the query to get âhotâ recommendations it might look like this:</p> -<pre><code>Query: +<div class="highlighter-rouge"><pre class="highlight"><code>Query: field: purchase; q:user's-purchase-history field: view; q:user's view-history field: popularity; q:"hot" </code></pre> +</div> <p>This will return recommendations favoring ones that have the intrinsic indicator âhotâ.</p> @@ -767,32 +642,25 @@ on the popularity field. If we use the ecom example but use the query to get â </ol> </div> -</div> - </div> - - -</div> -<div id="footer"> - <div class="container"> - <p>© 2017 Apache Mahout - with help from <a href="http://jekyllbootstrap.com" target="_blank" title="The Definitive Jekyll Blogging Framework">Jekyll Bootstrap</a> - and <a href="http://getbootstrap.com" target="_blank">Bootstrap</a> - </p> </div> -</div> - - +</div> + <footer class="footer bg-light"> + <div class="container text-center small"> + Copyright © 2014-2017 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. + </div> +</footer> + <script src="/assets/vendor/jquery/jquery-slim.min.js"></script> + <script src="/assets/vendor/popper/popper.min.js"></script> + <script src="/assets/vendor/bootstrap/js/bootstrap.min.js"></script> + <script src="/assets/header.js"></script> + <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script> -<!-- Latest compiled and minified JavaScript, requires jQuery 1.x (2.x not supported in IE8) --> -<!-- Placed at the end of the document so the pages load faster --> -<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script> -<script src="/assets/themes/mahout3/js/bootstrap.min.js"></script> </body> -</html> +</html>
