[GitHub] anirudhacharya commented on a change in pull request #13144: [MXNET-1203] Tutorial infogan

GitBox Fri, 09 Nov 2018 11:10:26 -0800

anirudhacharya commented on a change in pull request #13144: [MXNET-1203] 
Tutorial infogan 
URL: https://github.com/apache/incubator-mxnet/pull/13144#discussion_r232361462


 ##########
 File path: docs/tutorials/gluon/info_gan.md
 ##########
 @@ -0,0 +1,436 @@
+
+# Image similarity search with InfoGAN
+
+This notebook shows how to implement an InfoGAN based on Gluon. InfoGAN is an 
extension of GANs, where the generator input is split in 2 parts: random noise 
and a latent code (see [InfoGAN Paper](https://arxiv.org/pdf/1606.03657.pdf)). 
+The codes are made meaningful by maximizing the mutual information between 
code and generator output. InfoGAN learns a disentangled representation in a 
completely unsupervised manner. It can be used for many applications such as 
image similarity search. This notebook uses the DCGAN example from the 
[Straight Dope 
Book](https://gluon.mxnet.io/chapter14_generative-adversarial-networks/dcgan.html)
 and extends it to create an InfoGAN. 
+
+
+```python
+from __future__ import print_function
+from datetime import datetime
+import logging
+import multiprocessing
+import os
+import sys
+import tarfile
+import time
+
+import numpy as np
+from matplotlib import pyplot as plt
+from mxboard import SummaryWriter
+import mxnet as mx
+from mxnet import gluon
+from mxnet import ndarray as nd
+from mxnet.gluon import nn, utils
+from mxnet import autograd
+
+```
+
+The latent code vector can contain several variables, which can be categorical 
and/or continuous. We set `n_continuous` to 2 and `n_categories` to 10.
+
+
+```python
+batch_size   = 64
+z_dim        = 100
+n_continuous = 2
+n_categories = 10
+ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()
+```
+
+Some functions to load and normalize images.
+
+
+```python
+lfw_url = 'http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz'
+data_path = 'lfw_dataset'
+if not os.path.exists(data_path):
+    os.makedirs(data_path)
+    data_file = utils.download(lfw_url)
+    with tarfile.open(data_file) as tar:
+        tar.extractall(path=data_path)
+
+```
+
+
+```python
+def transform(data, width=64, height=64):
+    data = mx.image.imresize(data, width, height)
+    data = nd.transpose(data, (2,0,1))
+    data = data.astype(np.float32)/127.5 - 1
+    if data.shape[0] == 1:
+        data = nd.tile(data, (3, 1, 1))
+    return data.reshape((1,) + data.shape)
+```
+
+
+```python
+def get_files(data_dir):
+    images    = []
+    filenames = []
+    for path, _, fnames in os.walk(data_dir):
+        for fname in fnames:
+            if not fname.endswith('.jpg'):
+                continue
+            img = os.path.join(path, fname)
+            img_arr = mx.image.imread(img)
+            img_arr = transform(img_arr)
+            images.append(img_arr)
+            filenames.append(path + "/" + fname)
+    return images, filenames        
+```
+
+Load the dataset `lfw_dataset` which contains images of celebrities.
+
+
+```python
+data_dir = 'lfw_dataset'
+images, filenames = get_files(data_dir)
+split = int(len(images)*0.8)
+test_images = images[split:]
+test_filenames = filenames[split:]
+train_images = images[:split]
+train_filenames = filenames[:split]
+
+train_data = gluon.data.ArrayDataset(nd.concatenate(train_images))
+train_dataloader = gluon.data.DataLoader(train_data, batch_size=batch_size, 
shuffle=True, last_batch='rollover', num_workers=multiprocessing.cpu_count())
+```
+
+## Generator
+Define the Generator model. Architecture is taken from the DCGAN 
implementation in [Straight Dope 
Book](https://gluon.mxnet.io/chapter14_generative-adversarial-networks/dcgan.html).
 The Generator consist of  4 layers where each layer involves a strided 
convolution, batch normalization, and rectified nonlinearity. It takes as input 
random noise and the latent code and produces an `(64,64,3)` output image.
+
+
+```python
+class Generator(gluon.HybridBlock):
+    def __init__(self, **kwargs):
+        super(Generator, self).__init__(**kwargs)
+        with self.name_scope():
+            self.prev = nn.HybridSequential()
+            self.prev.add(nn.Dense(1024, use_bias=False), nn.BatchNorm(), 
nn.Activation(activation='relu'))
+            self.G = nn.HybridSequential()
+         
+            self.G.add(nn.Conv2DTranspose(64 * 8, 4, 1, 0, use_bias=False))
+            self.G.add(nn.BatchNorm())
+            self.G.add(nn.Activation('relu'))
+            self.G.add(nn.Conv2DTranspose(64 * 4, 4, 2, 1, use_bias=False))
+            self.G.add(nn.BatchNorm())
+            self.G.add(nn.Activation('relu'))
+            self.G.add(nn.Conv2DTranspose(64 * 2, 4, 2, 1, use_bias=False))
+            self.G.add(nn.BatchNorm())
+            self.G.add(nn.Activation('relu'))
+            self.G.add(nn.Conv2DTranspose(64, 4, 2, 1, use_bias=False))
+            self.G.add(nn.BatchNorm())
+            self.G.add(nn.Activation('relu'))
+            self.G.add(nn.Conv2DTranspose(3, 4, 2, 1, use_bias=False))
+            self.G.add(nn.Activation('tanh'))
+
+    def hybrid_forward(self, F, x):
+        x = self.prev(x)
+        x = F.reshape(x, (0, -1, 1, 1))
+        return self.G(x)
+```
+
+## Discriminator
+Define the Discriminator and Q model. The Q model shares many layers with the 
Discriminator. Its task is to estimate the code `c` for a given fake image.  It 
is used to maximize the lower bound to the mutual information.
+
+
+```python
+class Discriminator(gluon.HybridBlock):
+    def __init__(self, **kwargs):
+        super(Discriminator, self).__init__(**kwargs)
+        with self.name_scope():
+            self.D = nn.HybridSequential()
+            self.D.add(nn.Conv2D(64, 4, 2, 1, use_bias=False))
+            self.D.add(nn.LeakyReLU(0.2))
+            self.D.add(nn.Conv2D(64 * 2, 4, 2, 1, use_bias=False))
+            self.D.add(nn.BatchNorm())
+            self.D.add(nn.LeakyReLU(0.2))
+            self.D.add(nn.Conv2D(64 * 4, 4, 2, 1, use_bias=False))
+            self.D.add(nn.BatchNorm())
+            self.D.add(nn.LeakyReLU(0.2))
+            self.D.add(nn.Conv2D(64 * 8, 4, 2, 1, use_bias=False))
+            self.D.add(nn.BatchNorm())
+            self.D.add(nn.LeakyReLU(0.2))
+
+            self.D.add(nn.Dense(1024, use_bias=False), nn.BatchNorm(), 
nn.Activation(activation='relu'))
+       
+            self.prob = nn.Dense(1)
+            self.feat = nn.HybridSequential()
+            self.feat.add(nn.Dense(128, use_bias=False), nn.BatchNorm(), 
nn.Activation(activation='relu'))
+            self.category_prob = nn.Dense(n_categories)
+            self.continuous_mean = nn.Dense(n_continuous)
+            self.Q = nn.HybridSequential()
+            self.Q.add(self.feat, self.category_prob, self.continuous_mean)
+
+    def hybrid_forward(self, F, x):
+        x               = self.D(x)
+        prob            = self.prob(x)
+        feat            = self.feat(x)
+        category_prob   = self.category_prob(feat)
+        continuous_mean = self.continuous_mean(feat)
+        
+        return prob, category_prob, continuous_mean
+```
+
+The InfoGAN has the following layout.
+<img 
src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/doc/tutorials/info_gan/InfoGAN.png";
 style="width:800px;height:250px;">
 
 Review comment:
   Would be better if it showed that the output of the Discriminator is the 
probability distribution of real and fake, as opposed to it currently showing 
just "Real|Fake". But its a minor point, you can take a call on what would be 
better.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] anirudhacharya commented on a change in pull request #13144: [MXNET-1203] Tutorial infogan

Reply via email to