ThomasDelteil commented on a change in pull request #13411: [WIP] Gluon end to end tutorial URL: https://github.com/apache/incubator-mxnet/pull/13411#discussion_r247729496
########## File path: docs/tutorials/gluon/gluon_from_experiment_to_deployment.md ########## @@ -0,0 +1,487 @@ + +# Gluon: from experiment to deployment, an end to end example + +## Overview +MXNet Gluon API comes with a lot of great features, and it can provide you everything you need: from experimentation to deploying the model. In this tutorial, we will walk you through a common use case on how to build a model using gluon, train it on your data, and deploy it for inference. + +Let's say you need to build a service that provides flower species recognition. A common problem is that you don't have enough data to train a good model. In such cases, a technique called Transfer Learning can be used to make a more robust model. +In Transfer Learning we make use of a pre-trained model that solves a related task, and was trained on a very large standard dataset, such as ImageNet. ImageNet is from a different domain, but we can utilize the knowledge in this pre-trained model to perform the new task at hand. + +Gluon provides State of the Art models for many of the standard tasks such as Classification, Object Detection, Segmentation, etc. In this tutorial we will use the pre-trained model [ResNet50 V2](https://arxiv.org/abs/1603.05027) trained on ImageNet dataset. This model achieves 77.11% top-1 accuracy on ImageNet. We seek to transfer as much knowledge as possible for our task of recognizing different species of flowers. + + + + +## Prerequisites + +To complete this tutorial, you need: + +- [Build MXNet from source](https://mxnet.incubator.apache.org/install/ubuntu_setup.html#build-mxnet-from-source) with Python(Gluon) and C++ Packages +- Learn the basics about Gluon with [A 60-minute Gluon Crash Course](https://gluon-crash-course.mxnet.io/) +- Learn the basics about [MXNet C++ API](https://github.com/apache/incubator-mxnet/tree/master/cpp-package) + + +## The Data + +We will use the [Oxford 102 Category Flower Dataset](http://www.robots.ox.ac.uk/~vgg/data/flowers/102/) as an example to show you the steps. +We have prepared a utility file to help you download and organize your data into train, test, and validation sets. Run the following Python code to download and prepare the data: + + +```python +data_util_file = "oxford_102_flower_dataset.py" +base_url = "https://raw.githubusercontent.com/roywei/incubator-mxnet/gluon_tutorial/docs/tutorial_utils/data/{}?raw=true" +mx.test_utils.download(base_url.format(data_util_file), fname=data_util_file) +import oxford_102_flower_dataset + +# download and move data to train, test, valid folders +path = './data' +oxford_102_flower_dataset.get_data(path) +``` + +Now your data will be organized into the following format, all the images belong to the same category will be put together in the following pattern: +```bash +data +|--train +| |-- 0 +| | |-- image_06736.jpg +| | |-- image_06741.jpg +... +| |-- 1 +| | |-- image_06755.jpg +| | |-- image_06899.jpg +... +|-- test +| |-- 0 +| | |-- image_00731.jpg +| | |-- image_0002.jpg +... +| |-- 1 +| | |-- image_00036.jpg +| | |-- image_05011.jpg + +``` + +## Training using Gluon + +### Define Hyper-parameters + +Now let's first import necessary packages: + + +```python +import math +import os +import time +from multiprocessing import cpu_count + +import mxnet as mx +from mxnet import autograd +from mxnet import gluon, init +from mxnet.gluon import nn +from mxnet.gluon.data.vision import transforms +from mxnet.gluon.model_zoo.vision import resnet50_v2 +``` + +Next, we define the hyper-parameters that we will use for fine-tuning. We will use the [MXNet learning rate scheduler](https://mxnet.incubator.apache.org/tutorials/gluon/learning_rate_schedules.html) to adjust learning rates during training. + + +```python +classes = 102 +epochs = 40 +lr = 0.001 +per_device_batch_size = 32 +momentum = 0.9 +wd = 0.0001 + +lr_factor = 0.75 +# learning rate change at following epochs +lr_epochs = [10, 20, 30] + +num_gpus = mx.context.num_gpus() +num_workers = cpu_count() +ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()] +batch_size = per_device_batch_size * max(num_gpus, 1) +``` + +Now we will apply data augmentations on training images. This makes minor alterations on the training images, and our model will consider them as distinct images. This can be very useful for fine-tuning on a relatively small dataset, and it will help improve the model. We can use the Gluon [DataSet API](https://mxnet.incubator.apache.org/tutorials/gluon/datasets.html), [DataLoader API](https://mxnet.incubator.apache.org/tutorials/gluon/datasets.html), and [Transform API](https://mxnet.incubator.apache.org/tutorials/gluon/data_augmentation.html) to load the images and apply the following data augmentations: +1. Randomly crop the image and resize it to 224x224 +2. Randomly flip the image horizontally +3. Randomly jitter color and add noise +4. Transpose the data from height*width*num_channels to num_channels*height*width, and map values from [0, 255] to [0, 1] +5. Normalize with the mean and standard deviation from the ImageNet dataset. + + + +```python +jitter_param = 0.4 +lighting_param = 0.1 + +training_transformer = transforms.Compose([ + transforms.RandomResizedCrop(224), + transforms.RandomFlipLeftRight(), + transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param, + saturation=jitter_param), + transforms.RandomLighting(lighting_param), + transforms.ToTensor(), + transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) +]) + +validation_transformer = transforms.Compose([ + transforms.Resize(256), + transforms.CenterCrop(224), + transforms.ToTensor(), + transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) +]) + + +train_path = os.path.join(path, 'train') +val_path = os.path.join(path, 'valid') +test_path = os.path.join(path, 'test') + +# loading the data and apply pre-processing(transforms) on images +train_data = gluon.data.DataLoader( + gluon.data.vision.ImageFolderDataset(train_path).transform_first(training_transformer), + batch_size=batch_size, shuffle=True, num_workers=num_workers) + +val_data = gluon.data.DataLoader( + gluon.data.vision.ImageFolderDataset(val_path).transform_first(validation_transformer), + batch_size=batch_size, shuffle=False, num_workers=num_workers) + +test_data = gluon.data.DataLoader( + gluon.data.vision.ImageFolderDataset(test_path).transform_first(validation_transformer), + batch_size=batch_size, shuffle=False, num_workers=num_workers) +``` + +### Loading pre-trained model + + +We will use pre-trained ResNet50_v2 model which was pre-trained on the [ImageNet Dataset](http://www.image-net.org/) with 1000 classes. To match the classes in the Flower dataset, we must redefine the last softmax (output) layer to be 102, then initialize the parameters. + +Before we go to training, one unique Gluon feature you should be aware of is hybridization. It allows you to convert your imperative code to a static symbolic graph, which is much more efficient to execute. There are two main benefits of hybridizing your model: better performance and easier serialization for deployment. The best part is that it's as simple as just calling `net.hybridize()`. To know more about Gluon hybridization, please follow the [hybridization tutorial](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html). + + + +```python +# load pre-trained resnet50_v2 from model zoo +finetune_net = resnet50_v2(pretrained=True, ctx=ctx) + +# change last softmax layer since number of classes are different +with finetune_net.name_scope(): + finetune_net.output = nn.Dense(classes) +finetune_net.output.initialize(init.Xavier(), ctx=ctx) +# hybridize for better performance +finetune_net.hybridize() + +trainer = gluon.Trainer(finetune_net.collect_params(), 'sgd', { + 'learning_rate': lr, 'momentum': momentum, 'wd': wd}) +metric = mx.metric.Accuracy() +softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss() +``` + +### Fine-tuning model on your custom dataset + +Now let's define the test metrics and start fine-tuning. + + + +```python +def test(net, val_data, ctx): + metric = mx.metric.Accuracy() + for i, (data, label) in enumerate(val_data): + data = gluon.utils.split_and_load(data, ctx_list=ctx, even_split=False) + label = gluon.utils.split_and_load(label, ctx_list=ctx, even_split=False) + outputs = [net(x) for x in data] + metric.update(label, outputs) + return metric.get() + + +num_batch = len(train_data) +iteration_idx = 1 + +# setup learning rate scheduler +iterations_per_epoch = math.ceil(num_batch) +# learning rate change at following steps +lr_steps = [epoch * iterations_per_epoch for epoch in lr_epochs] +schedule = mx.lr_scheduler.MultiFactorScheduler(step=lr_steps, factor=lr_factor, base_lr=lr) Review comment: you can pass directly this schedule into your trainer rather than calling on very iteration, check the tutorials on learning rate scheduler ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
