Repository: mahout Updated Branches: refs/heads/website 2346425bd -> 747d94b1d
WEBSITE Added Eigenfaces Demo Project: http://git-wip-us.apache.org/repos/asf/mahout/repo Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/747d94b1 Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/747d94b1 Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/747d94b1 Branch: refs/heads/website Commit: 747d94b1da72c7e3b909087b8f04e03d760df346 Parents: 2346425 Author: Trevor <[email protected]> Authored: Mon May 1 12:03:23 2017 -0500 Committer: Trevor <[email protected]> Committed: Mon May 1 12:03:23 2017 -0500 ---------------------------------------------------------------------- website/docs/_includes/algo_navbar.html | 1 + website/docs/_includes/tutorial_navbar.html | 2 +- .../docs/algorithms/preprocessors/MeanCenter.md | 30 ++++++ website/docs/tutorials/eigenfaces/index.md | 107 +++++++++++++++++++ 4 files changed, 139 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mahout/blob/747d94b1/website/docs/_includes/algo_navbar.html ---------------------------------------------------------------------- diff --git a/website/docs/_includes/algo_navbar.html b/website/docs/_includes/algo_navbar.html index 3961c21..831f48d 100644 --- a/website/docs/_includes/algo_navbar.html +++ b/website/docs/_includes/algo_navbar.html @@ -14,6 +14,7 @@ <ul class="nav sidebar-nav"> <li> <a href="{{ BASE_PATH }}/algorithms/preprocessors/AsFactor.html">AsFactor (a.k.a. One-Hot-Encoding)</a></li> <li> <a href="{{ BASE_PATH }}/algorithms/preprocessors/StandardScaler.html">StandardScaler</a></li> + <li> <a href="{{ BASE_PATH }}/algorithms/preprocessors/MeanCenter.html">MeanCenter</a></li> </ul> </div> </div> http://git-wip-us.apache.org/repos/asf/mahout/blob/747d94b1/website/docs/_includes/tutorial_navbar.html ---------------------------------------------------------------------- diff --git a/website/docs/_includes/tutorial_navbar.html b/website/docs/_includes/tutorial_navbar.html index ee75d75..6a208b6 100644 --- a/website/docs/_includes/tutorial_navbar.html +++ b/website/docs/_includes/tutorial_navbar.html @@ -4,7 +4,7 @@ <a href="#linalg" class="list-group-item list-group-item-success" data-toggle="collapse" data-parent="#MrTutorialMenu"><b>Linear Algebra</b><i class="fa fa-caret-down"></i></a> <div class="collapse" id="linalg"> <ul class="nav sidebar-nav"> - <li> <a href="{{ BASE_PATH }}/tutorials/eigenfaces">Eigenfaces Demo</a></li> + <li> <a href="{{ BASE_PATH }}/tutorials/eigenfaces">Eigenfaces Demo (Shell or Zeppelin)</a></li> </ul> </div></div></div> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/mahout/blob/747d94b1/website/docs/algorithms/preprocessors/MeanCenter.md ---------------------------------------------------------------------- diff --git a/website/docs/algorithms/preprocessors/MeanCenter.md b/website/docs/algorithms/preprocessors/MeanCenter.md new file mode 100644 index 0000000..771b6a3 --- /dev/null +++ b/website/docs/algorithms/preprocessors/MeanCenter.md @@ -0,0 +1,30 @@ +--- +layout: algorithm +title: MeanCenter +theme: + name: mahout2 +--- + +### About + +`MeanCenter` centers values about the column mean. + +### Parameters + +### Example + +```scala +import org.apache.mahout.math.algorithms.preprocessing.MeanCenter + +val A = drmParallelize(dense( + (1, 1, -2), + (2, 5, 2), + (3, 9, 0)), numPartitions = 2) + +val scaler: MeanCenterModel = new MeanCenter().fit(A) + +val centeredA = scaler.transform(A) +``` + + + http://git-wip-us.apache.org/repos/asf/mahout/blob/747d94b1/website/docs/tutorials/eigenfaces/index.md ---------------------------------------------------------------------- diff --git a/website/docs/tutorials/eigenfaces/index.md b/website/docs/tutorials/eigenfaces/index.md index d6e019d..0db0d14 100644 --- a/website/docs/tutorials/eigenfaces/index.md +++ b/website/docs/tutorials/eigenfaces/index.md @@ -4,3 +4,110 @@ title: Eigenfaces Demo theme: name: mahout3 --- + +*Credit: [original blog post by rawkintrevo](https://rawkintrevo.org/2016/11/10/deep-magic-volume-3-eigenfaces/). This will be maintained through version changes, blog post will not.* + +*Eigenfaces* are an image equivelent(ish) to *eigenvectors* if you recall your high school linear algebra classes. If you don't recall: [read wikipedia](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors) otherwise, it is a set of 'faces' that by a linear combination can be used to represent other faces. + +Their are lots of "image recognition" things out there right now, and deep learning is the popular one everyone is talking about. +Deep learning will admittedly do better a recognizing and correctly classifying faces, however it does so at a price. +1. Neural networks are very costly to train in the first place +1. Everytime a new person is added, the neural network must be retrained to recognize the new person + +The advantage/use-case for the eigenfaces approach is when new faces are being regularly added. Even when building a production +grade eigenfaces based system- neural networks still have a place- _idenitifying faces_ in images, and creating _centered and scaled_ images around +the face. This is scalable because we only need to train our neural network to detect, center, and scale faces once. E.g. +a neural network would be deployed as a microservice, and then eigenfaces would be deployed as a microservice. + +A production version ends up looking something like this: +- Image comes in- is fed to 'detect faces, center, scale- neural network based microservice' +- Neural network microservice detects faces, centers and scales. Passes each face to eigenfaces microservice +- For each face:<br> + a. Decompose face into linear combination of eigenfaces<br> + b. Determine if linear combination vector is close enough to any exististing vector to declare a match <br> + c. If no match "add new person" to face corpus. + +### Get the data + +The first thing we're going to do is collect a set of 13,232 face images (250x250 pixels) from the <a href="http://vis-www.cs.umass.edu/lfw/">Labeled Faces in the Wild</a> data set. + + cd /tmp + mkdir eigenfaces + wget http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz + tar -xzf lfw-deepfunneled.tgz + +### Load dependencies + + cd $MAHOUT_HOME/bin + ./mahout spark-shell \ + --packages com.sksamuel.scrimage:scrimage-core_2.10:2.1.0, \ + com.sksamuel.scrimage:scrimage-io-extra_2.10:2.1.0, \ + com.sksamuel.scrimage:scrimage-filters_2.10:2.1.0 + + + +### Create a DRM of Vectorized Images + +```scala +import com.sksamuel.scrimage._ +import com.sksamuel.scrimage.filter.GrayscaleFilter + +val imagesRDD:DrmRdd[Int] = sc.binaryFiles("/tmp/lfw-deepfunneled/*/*", 500) + .map(o => new DenseVector( Image.apply(o._2.toArray) + .filter(GrayscaleFilter) + .pixels + .map(p => p.toInt.toDouble / 10000000)) ) + .zipWithIndex + .map(o => (o._2.toInt, o._1)) + +val imagesDRM = drmWrap(rdd= imagesRDD).par(min = 500).checkpoint() + +println(s"Dataset: ${imagesDRM.nrow} images, ${imagesDRM.ncol} pixels per image") +``` + +### Mean Center the Images + +```scala +import org.apache.mahout.math.algorithms.preprocessing.MeanCenter + + +val scaler: MeanCenterModel = new MeanCenter().fit(imagesDRM) + +val centeredImages = scaler.transform(imagesDRM) +``` + + +### Calculate the Eigenimages via DS-SVD + +```scala +import org.apache.mahout.math._ +import decompositions._ +import drm._ + +val(drmU, drmV, s) = dssvd(centeredImages, k= 20, p= 15, q = 0) +``` + +### Write the Eigenfaces to Disk + +```scala +import java.io.File +import javax.imageio.ImageIO + +val sampleImagePath = "/home/guest/lfw-deepfunneled/Aaron_Eckhart/Aaron_Eckhart_0001.jpg" +val sampleImage = ImageIO.read(new File(sampleImagePath)) +val w = sampleImage.getWidth +val h = sampleImage.getHeight + +val eigenFaces = drmV.t.collect(::,::) +val colMeans = scaler.colCentersV + +for (i <- 0 until 20){ + val v = (eigenFaces(i, ::) + colMeans) * 10000000 + val output = new Array[com.sksamuel.scrimage.Pixel](v.size) + for (i <- 0 until v.size) { + output(i) = Pixel(v.get(i).toInt) + } + val image = Image(w, h, output) + image.output(new File(s"/tmp/eigenfaces/${i}.png")) +} +``` \ No newline at end of file
