areusch commented on code in PR #10921:
URL: https://github.com/apache/tvm/pull/10921#discussion_r863213004


##########
gallery/how_to/work_with_microtvm/micro_train.py:
##########
@@ -0,0 +1,638 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+.. _microtvm-train-arduino:
+
+Training Vision Models for microTVM on Arduino
+==============================================
+**Author**: `Gavin Uberti <https://github.com/guberti>`_
+
+This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.
+"""
+
+######################################################################
+# .. note::
+#
+#   This tutorial is best viewed as a Jupyter Notebook. You can download and 
run it locally
+#   using the link at the bottom of this page, or open it online for free 
using Google Colab.
+#   Click the icon below to open in Google Colab.
+#
+# .. image:: 
https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png
+#      :align: center
+#      :target: 
https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb

Review Comment:
   does this URL need to get updated each time the tutorial is changed?



##########
gallery/how_to/work_with_microtvm/micro_train.py:
##########
@@ -0,0 +1,638 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+.. _microtvm-train-arduino:
+
+Training Vision Models for microTVM on Arduino
+==============================================
+**Author**: `Gavin Uberti <https://github.com/guberti>`_
+
+This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.
+"""
+
+######################################################################
+# .. note::
+#
+#   This tutorial is best viewed as a Jupyter Notebook. You can download and 
run it locally
+#   using the link at the bottom of this page, or open it online for free 
using Google Colab.
+#   Click the icon below to open in Google Colab.
+#
+# .. image:: 
https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png
+#      :align: center
+#      :target: 
https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
+#      :width: 600px
+#
+# Motivation
+# ----------
+# When building IOT devices, we often want them to **see and understand** the 
world around them.
+# This can take many forms, but often times a device will want to know if a 
certain **kind of
+# object** is in its field of vision.
+#
+# For example, a security camera might look for **people**, so it can decide 
whether to save a video
+# to memory. A traffic light might look for **cars**, so it can judge which 
lights should change
+# first. Or a forest camera might look for a **kind of animal**, so they can 
estimate how large
+# the animal population is.
+#
+# To make these devices affordable, we would like them to need only a low-cost 
processor like the
+# `nRF52840 <https://www.nordicsemi.com/Products/nRF52840>`_ (costing five 
dollars each on Mouser) or the `RP2040 
<https://www.raspberrypi.com/products/rp2040/>`_ (just $1.45 each!).
+#
+# These devices have very little memory (~250 KB RAM), meaning that no 
conventional edge AI
+# vision model (like MobileNet or EfficientNet) will be able to run. In this 
tutorial, we will
+# show how these models can be modified to work around this requirement. Then, 
we will use TVM
+# to compile and deploy it for an Arduino that uses one of these processors.
+#
+# Installing the Prerequisites
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# To run this tutorial, we will need Tensorflow and TFLite to train our model, 
pyserial and tlcpack
+# (a community build of TVM) to compile and test it, and imagemagick and curl 
to preprocess data.
+# We will also need to install the Arduino CLI and the mbed_nano package to 
test our model.
+#
+#     .. code-block:: bash
+#
+#       %%bash
+#       pip install -q tensorflow tflite pyserial
+#       pip install -q tlcpack-nightly -f https://tlcpack.ai/wheels
+#       apt-get -qq install imagemagick curl
+#
+#       # Install Arduino CLI and library for Nano 33 BLE
+#       curl -fsSL 
https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh
+#       /content/bin/arduino-cli core update-index
+#       /content/bin/arduino-cli core install arduino:mbed_nano
+#
+# Using the GPU
+# ^^^^^^^^^^^^^
+#
+# This tutorial demonstrates training a neural network, which is requires a 
lot of computing power
+# and will go much faster if you have a GPU. If you are viewing this tutorial 
on Google Colab, you
+# can enable a GPU by going to **Runtime->Change runtime type** and selecting 
"GPU" as our hardware
+# accelerator. If you are running locally, you can `follow Tensorflow's guide 
<https://www.tensorflow.org/guide/gpu>`_ instead.
+#
+# We can test our GPU installation with the following code:
+
+import tensorflow as tf
+
+if not tf.test.gpu_device_name():
+    print("No GPU was detected!")
+    print("Model training will take much longer (~30 minutes instead of ~5)")
+else:
+    print("GPU detected - you're good to go.")
+
+######################################################################
+# Choosing Our Work Dir
+# ^^^^^^^^^^^^^^^^^^^^^
+# We need to pick a directory where our image datasets, trained model, and 
eventual Arduino sketch
+# will all live. If running on Google Colab, we'll save everything in 
``/root`` (aka ``~``) but you'll
+# probably want to store it elsewhere if running locally. Note that this 
variable only affects Python
+# scripts - you'll have to adjust the Bash commands too.
+
+import os
+
+FOLDER = "/root"
+# sphinx_gallery_start_ignore
+import tempfile
+
+FOLDER = tempfile.mkdtemp()
+# sphinx_gallery_end_ignore
+
+######################################################################
+# Downloading the Data
+# --------------------
+# Convolutional neural networks usually learn by looking at many images, along 
with labels telling
+# the network what those images are. To get these images, we'll need a 
publicly available dataset
+# with thousands of images of all sorts of objects and labels of what's in 
each image. We'll also
+# need a bunch of images that **aren't** of cars, as we're trying to 
distinguish these two classes.
+#
+# In this tutorial, we'll create a model to detect if an image contains a 
**car**, but you can use
+# whatever category you like! Just change the source URL below to one 
containing images of another
+# type of object.
+#
+# To get our car images, we'll be downloading the `Stanford Cars dataset 
<http://ai.stanford.edu/~jkrause/cars/car_dataset.html>`_,
+# which contains 16,185 full color images of cars. We'll also need images of 
random things that
+# aren't cars, so we'll use the `COCO 2017 <https://cocodataset.org/#home>`_ 
validation set (it's
+# smaller, and thus faster to download than the full training set. Training on 
the full data set
+# would yield better results). Note that there are some cars in the COCO 2017 
data set, but it's
+# a small enough fraction not to matter - just keep in mind that this will 
drive down our percieved
+# accuracy slightly.
+#
+# We could use the Tensorflow dataloader utilities, but we'll instead do it 
manually to make sure
+# it's easy to change the datasets being used. We'll end up with the following 
file hierarchy:
+#
+#     .. code-block::
+#
+#         /root
+#         ├── images
+#         │   ├── object
+#         │   │   ├── 000001.jpg
+#         │   │   │ ...
+#         │   │   └── 016185.jpg
+#         │   ├── object.tgz
+#         │   ├── random
+#         │   │   ├── 000000000139.jpg
+#         │   │   │ ...
+#         │   │   └── 000000581781.jpg
+#         │   └── random.zip
+#
+# We should also note that Stanford cars has 8k images, while the COCO 2017 
validation set is 5k
+# images - it is not a 50/50 split! If we wanted to, we could weight these 
classes differently
+# during training to correct for this, but training will still work if we 
ignore it. It should
+# take about **2 minutes** to download the Stanford Cars, while COCO 2017 
validation will take
+# **1 minute**.
+
+import os
+import shutil
+import urllib.request
+
+# Download datasets
+os.makedirs(f"{FOLDER}/images")
+urllib.request.urlretrieve(
+    "http://ai.stanford.edu/~jkrause/car196/cars_train.tgz";, 
f"{FOLDER}/images/target.tgz"
+)
+urllib.request.urlretrieve(
+    "http://images.cocodataset.org/zips/val2017.zip";, 
f"{FOLDER}/images/random.zip"
+)
+
+# Extract them and rename their folders
+shutil.unpack_archive(f"{FOLDER}/images/target.tgz", f"{FOLDER}/images")
+shutil.unpack_archive(f"{FOLDER}/images/random.zip", f"{FOLDER}/images")
+shutil.move(f"{FOLDER}/images/cars_train", f"{FOLDER}/images/target")
+shutil.move(f"{FOLDER}/images/val2017", f"{FOLDER}/images/random")
+
+######################################################################
+# Loading the Data
+# ----------------
+# Currently, our data is stored on-disk as JPG files of various sizes. To 
train with it, we'll have
+# to load the images into memory, resize them to be 64x64, and convert them to 
raw, uncompressed
+# data. Keras's ``image_dataset_from_directory`` will take care of most of 
this, though it loads
+# images such that each pixel value is a float from 0 to 255.
+#
+# We'll also need to load labels, though Keras will help with this. From our 
subdirectory structure,
+# it knows the images in ``/objects`` are one class, and those in ``/random`` 
another. Setting
+# ``label_mode='categorical'`` tells Keras to convert these into **categorical 
labels** - a 2x1 vector
+# that's either ``[1, 0]`` for an object of our target class, or ``[0, 1]`` 
vector for anything else.
+# We'll also set ``shuffle=True`` to randomize the order of our examples.
+#
+# We will also **batch** the data - grouping samples into clumps to make our 
training go faster.
+# Setting ``batch_size = 32`` is a decent number.
+#
+# Lastly, in machine learning we generally want our inputs to be small 
numbers. We'll thus use a
+# ``Rescaling`` layer to change our images such that each pixel is a float 
between ``0.0`` and ``1.0``,
+# instead of ``0`` to ``255``. We need to be careful not to rescale our 
categorical labels though, so
+# we'll use a ``lambda`` function.
+
+IMAGE_SIZE = (64, 64, 3)
+unscaled_dataset = tf.keras.utils.image_dataset_from_directory(
+    f"{FOLDER}/images",
+    batch_size=32,
+    shuffle=True,
+    label_mode="categorical",
+    image_size=IMAGE_SIZE[0:2],
+)
+rescale = tf.keras.layers.Rescaling(scale=1.0 / 255)
+full_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl))
+
+######################################################################
+# What's Inside Our Dataset?
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Before giving this data set to our neural network, we ought to give it a 
quick visual inspection.
+# Does the data look properly transformed? Do the labels seem appropriate? And 
what's our ratio of
+# objects to other stuff? We can display some examples from our datasets using 
``matplotlib``:
+
+import matplotlib.pyplot as plt
+
+num_target_class = len(os.listdir(f"{FOLDER}/images/target/"))
+num_random_class = len(os.listdir(f"{FOLDER}/images/random/"))
+print(f"{FOLDER}/images/target contains {num_target_class} images")
+print(f"{FOLDER}/images/random contains {num_random_class} images")
+
+# Show some samples and their labels
+SAMPLES_TO_SHOW = 10
+plt.figure(figsize=(20, 10))
+for i, (image, label) in enumerate(unscaled_dataset.unbatch()):
+    if i >= SAMPLES_TO_SHOW:
+        break
+    ax = plt.subplot(1, SAMPLES_TO_SHOW, i + 1)
+    plt.imshow(image.numpy().astype("uint8"))
+    plt.title(list(label.numpy()))
+    plt.axis("off")
+
+######################################################################
+# Validating our Accuracy
+# ^^^^^^^^^^^^^^^^^^^^^^^
+# While developing our model, we'll often want to check how accurate it is 
(e.g. to see if it
+# improves during training). How do we do this? We could just train it on 
*all* of the data, and
+# then ask it to classify that same data. However, our model could cheat by 
just memorizing all of
+# the samples, which would make it *appear* to have very high accuracy, but 
perform very badly in
+# reality. In practice, this "memorizing" is called **overfitting**.
+#
+# To prevent this, we will set aside some of the data (we'll use 20%) as a 
**validation set**. Our
+# model will never be trained on validation data - we'll only use it to check 
our model's accuracy.
+
+num_batches = len(full_dataset)
+train_dataset = full_dataset.take(int(num_batches * 0.8))
+validation_dataset = full_dataset.skip(len(train_dataset))
+
+######################################################################
+# Loading the Data
+# ----------------
+# In the past decade, `convolutional neural networks 
<https://en.wikipedia.org/wiki/Convolutional_neural_network>`_ have been widely
+# adopted for image classification tasks. State-of-the-art models like 
`EfficientNet V2 <https://arxiv.org/abs/2104.00298>`_ are able
+# to perform image classification better than even humans! Unfortunately, 
these models have tens of
+# millions of parameters, and thus won't fit on cheap security camera 
computers.
+#
+# Our applications generally don't need perfect accuracy - 90% is good enough. 
We can thus use the
+# older and smaller MobileNet V1 architecture. But this *still* won't be small 
enough - by default,
+# MobileNet V1 with 224x224 inputs and depth 1.0 takes ~50 MB to just 
**store**. To reduce the size
+# of the model, there are three knobs we can turn. First, we can reduce the 
size of the input images
+# from 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can 
also reduce the **depth**
+# of the model, from 1.0 to 0.25. And if we were really strapped for space, we 
could reduce the
+# number of **channels** by making our model take grayscale images instead of 
RGB ones.
+#
+# In this tutorial, we will use an RGB 64x64 input image and 0.25 depth scale. 
This is not quite
+# ideal, but it allows the finished model to fit in 192 KB of RAM, while still 
letting us perform
+# transfer learning using the official Tensorflow source models (if we used 
depth scale <0.25 or
+# a grayscale input, we wouldn't be able to do this).
+#
+# What is Transfer Learning?
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Deep learning has `dominated image classification 
<https://paperswithcode.com/sota/image-classification-on-imagenet>`_ for a long 
time,
+# but training neural networks takes a lot of time. When a neural network is 
trained "from scratch",
+# its parameters start out randomly initialized, forcing it to learn very 
slowly how to tell images
+# apart.
+#
+# With transfer learning, we instead start with a neural network that's 
**already** good at a
+# specific task. In this example, that task is classifying images from `the 
ImageNet database <https://www.image-net.org/>`_. This
+# means the network already has some object detection capabilities, and is 
likely closer to what you
+# want then a random model would be.
+#
+# This works especially well with image processing neural networks like 
MobileNet. In practice, it
+# turns out the convolutional layers of the model (i.e. the first 90% of the 
layers) are used for
+# identifying low-level features like lines and shapes - only the last few 
fully connected layers
+# are used to determine how those shapes make up the objects the network is 
trying to detect.
+#
+# We can take advantage of this by starting training with a MobileNet model 
that was trained on
+# ImageNet, and already knows how to identify those lines and shapes. We can 
then just remove the
+# last few layers from this pretrained model, and add our own final layers. 
We'll then train this
+# conglomerate model for a few epochs on our cars vs non-cars dataset, to fine 
tune the first layers
+# and train from scratch the last layers.
+#
+# Source MobileNets for transfer learning have been `pretrained by the 
Tensorflow folks 
<https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md>`_,
 so we
+# can just download the one closest to what we want (the 128x128 input model 
with 0.25 depth scale).
+
+os.makedirs(f"{FOLDER}/models")
+WEIGHTS_PATH = f"{FOLDER}/models/mobilenet_2_5_128_tf.h5"
+urllib.request.urlretrieve(
+    
"https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5";,
+    WEIGHTS_PATH,
+)
+
+pretrained = tf.keras.applications.MobileNet(
+    input_shape=IMAGE_SIZE, weights=WEIGHTS_PATH, alpha=0.25
+)
+
+######################################################################
+# Modifying Our Network
+# ^^^^^^^^^^^^^^^^^^^^^
+# As mentioned above, our pretrained model is designed to classify the 1,000 
ImageNet categories,
+# but we want to convert it to classify cars. Since only the bottom few layers 
are task-specific,
+# we'll **cut off the last five layers** of our original model. In their place 
we'll build our own
+# "tail" to the model by performing respape, dropout, flatten, and softmax 
operations.
+
+model = tf.keras.models.Sequential()
+
+model.add(tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE))
+model.add(tf.keras.Model(inputs=pretrained.inputs, 
outputs=pretrained.layers[-5].output))
+
+model.add(tf.keras.layers.Reshape((-1,)))
+model.add(tf.keras.layers.Dropout(0.1))
+model.add(tf.keras.layers.Flatten())
+model.add(tf.keras.layers.Dense(2, activation="softmax"))
+
+######################################################################
+# Training Our Network
+# ^^^^^^^^^^^^^^^^^^^^
+# When training neural networks, we must set a parameter called the **learning 
rate** that controls
+# how fast our network learns. It must be set carefully - too slow, and our 
network will take
+# forever to train; too fast, and our network won't be able to learn some fine 
details. Generally
+# for Adam (the optimizer we're using), ``0.001`` is a pretty good learning 
rate (and is what's
+# recommended in the `original paper <https://arxiv.org/abs/1412.6980>`_). 
However, in this case
+# ``0.0005`` seems to work a little better.
+#
+# We'll also pass the validation set from earlier to ``model.fit``. This will 
evaluate how good our
+# model is each time we train it, and let us track how our model is improving. 
Once training is
+# finished, the model should have a validation accuracy around ``0.98`` 
(meaning it was right 98% of
+# the time on our validation set).
+
+model.compile(
+    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
+    loss="categorical_crossentropy",
+    metrics=["accuracy"],
+)
+model.fit(train_dataset, validation_data=validation_dataset, epochs=3, 
verbose=2)
+
+######################################################################
+# Quantization
+# ------------
+# We've done a decent job of reducing our model's size so far - changing the 
input dimension,
+# along with removing the bottom layers reduced the model to just 219k 
parameters. However, each of
+# these parameters is a ``float32`` that takes four bytes, so our model will 
take up almost one MB!
+#
+# Additionally, it might be the case that our hardware doesn't have built-in 
support for floating
+# point numbers. While most high-memory Arduinos (like the Nano 33 BLE) do 
have hardware support,
+# some others (like the Arduino Due) do not. On any boards *without* dedicated 
hardware support,
+# floating point multiplication will be extremely slow.
+#
+# To address both issues we will **quantize** the model - representing the 
weights as eight bit
+# integers. It's more complex than just rounding, though - to get the best 
performance, TensorFlow
+# tracks how each neuron in our model activates, so we can figure out how to 
best represent the
+# while being relatively truthful to the original model.
+#
+# We will help TensorFlow do this by creating a representative dataset - a 
subset of the original
+# that is used for tracking how those neurons activate. We'll then pass this 
into a ``TFLiteConverter``
+# (Keras itself does not have quantization support) with an ``Optimize`` flag 
to tell TFLite to perform
+# the conversion. By default, TFLite keeps the inputs and outputs of our model 
as floats, so we must
+# explicitly tell it to avoid this behavior.
+
+converter = tf.lite.TFLiteConverter.from_keras_model(model)
+
+
+def representative_dataset():
+    for image_batch, label_batch in full_dataset.take(10):
+        yield [image_batch]
+
+
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
+converter.representative_dataset = representative_dataset
+converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
+converter.inference_input_type = tf.uint8
+converter.inference_output_type = tf.uint8
+
+quantized_model = converter.convert()
+
+######################################################################
+# Download the Model if Desired
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# We've now got a finished model, that you can use locally or in other 
tutorials (try autotuning
+# this model or viewing it on `https://netron.app/ <https://netron.app/>`_). 
But before we do
+# those things, we'll have to write it to a file (``quantized.tflite``). If 
you're running this
+# tutorial on Google Colab, you'll have to uncomment the last two lines to 
download the file
+# after writing it.
+
+QUANTIZED_MODEL_PATH = f"{FOLDER}/models/quantized.tflite"
+with open(QUANTIZED_MODEL_PATH, "wb") as f:
+    f.write(quantized_model)
+# from google.colab import files
+# files.download(QUANTIZED_MODEL_PATH)
+
+######################################################################
+# Compiling With TVM For Arduino
+# ------------------------------
+# Tensorflow has a built-in framework for deploying to microcontrollers - 
`TFLite Micro <https://www.tensorflow.org/lite/microcontrollers>`_. However,
+# it's poorly supported by development boards, and does not support 
autotuning. We will use Apache
+# TVM instead.
+#
+# TVM can be used either with its command line interface (``tvmc``) or with 
its Python interface. The
+# Python interface is fully-featured and more stable, so we'll use it here.
+#
+# TVM is an optimizing compiler, and optimizations to our model are performed 
in stages via
+# **intermediate representations**. The first of these is `Relay 
<https://arxiv.org/abs/1810.00952>`_ a high-level intermediate
+# representation emphasizing portability. The conversion from ``.tflite`` to 
Relay is done without any
+# knowledge of our "end goal" - the fact we intend to run this model on an 
Arduino.
+#
+# Choosing an Arduino Board
+# ^^^^^^^^^^^^^^^^^^^^^^^^^
+# Next, we'll have to decide exactly which Arduino board to use. The Arduino 
sketch that we
+# ultimately generate should be compatible with any board, but knowing which 
board we are using in
+# advance allows TVM to adjust its compilation strategy to get better 
performance.
+#
+# There is one catch - we need enough **memory** (flash and RAM) to be able to 
run our model. We
+# won't ever be able to run a complex vision model like a MobileNet on an 
Arduino Uno - that board
+# only has 2 kB of RAM and 32 kB of flash! Our model has ~200,000 parameters, 
so there is just no
+# way it could fit.
+#
+# For this tutorial, we will use the Nano 33 BLE, which has 1 MB of flash 
memory and 256 KB of RAM.
+# However, any other Arduino with those specs or better should also work.
+#
+# Generating our project
+# ^^^^^^^^^^^^^^^^^^^^^^
+# Next, we'll compile the model to TVM's MLF (machine learning format) 
intermediate representation,

Review Comment:
   Model Library Format



##########
tests/scripts/ci.py:
##########
@@ -267,7 +267,7 @@ def docs(
             "tlcpack-sphinx-addon==0.2.1",
             "synr==0.5.0",
             "image==1.5.33",
-            "sphinx-gallery==0.4.0",
+            
"git+https://github.com/guberti/sphinx-gallery.git@ipynb-include-bash";,

Review Comment:
   should we update docker/install scripts too if we're going to go this route? 
also, is this based off 0.4.0? may need to backport or verify it won't break 
anything to update.



##########
gallery/how_to/work_with_microtvm/micro_train.py:
##########
@@ -0,0 +1,638 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+.. _microtvm-train-arduino:
+
+Training Vision Models for microTVM on Arduino
+==============================================
+**Author**: `Gavin Uberti <https://github.com/guberti>`_
+
+This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.
+"""
+
+######################################################################
+# .. note::
+#
+#   This tutorial is best viewed as a Jupyter Notebook. You can download and 
run it locally
+#   using the link at the bottom of this page, or open it online for free 
using Google Colab.
+#   Click the icon below to open in Google Colab.
+#
+# .. image:: 
https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png
+#      :align: center
+#      :target: 
https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
+#      :width: 600px

Review Comment:
   should we consider shrinking this button? i don't want it to be 
inconspicuous, but it's pretty big right now.



##########
gallery/how_to/work_with_microtvm/micro_train.py:
##########
@@ -0,0 +1,638 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+.. _microtvm-train-arduino:
+
+Training Vision Models for microTVM on Arduino
+==============================================
+**Author**: `Gavin Uberti <https://github.com/guberti>`_
+
+This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.
+"""
+
+######################################################################
+# .. note::
+#
+#   This tutorial is best viewed as a Jupyter Notebook. You can download and 
run it locally
+#   using the link at the bottom of this page, or open it online for free 
using Google Colab.
+#   Click the icon below to open in Google Colab.
+#
+# .. image:: 
https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png
+#      :align: center
+#      :target: 
https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
+#      :width: 600px
+#
+# Motivation
+# ----------
+# When building IOT devices, we often want them to **see and understand** the 
world around them.
+# This can take many forms, but often times a device will want to know if a 
certain **kind of
+# object** is in its field of vision.
+#
+# For example, a security camera might look for **people**, so it can decide 
whether to save a video
+# to memory. A traffic light might look for **cars**, so it can judge which 
lights should change
+# first. Or a forest camera might look for a **kind of animal**, so they can 
estimate how large
+# the animal population is.
+#
+# To make these devices affordable, we would like them to need only a low-cost 
processor like the
+# `nRF52840 <https://www.nordicsemi.com/Products/nRF52840>`_ (costing five 
dollars each on Mouser) or the `RP2040 
<https://www.raspberrypi.com/products/rp2040/>`_ (just $1.45 each!).
+#
+# These devices have very little memory (~250 KB RAM), meaning that no 
conventional edge AI
+# vision model (like MobileNet or EfficientNet) will be able to run. In this 
tutorial, we will
+# show how these models can be modified to work around this requirement. Then, 
we will use TVM
+# to compile and deploy it for an Arduino that uses one of these processors.
+#
+# Installing the Prerequisites
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# To run this tutorial, we will need Tensorflow and TFLite to train our model, 
pyserial and tlcpack
+# (a community build of TVM) to compile and test it, and imagemagick and curl 
to preprocess data.
+# We will also need to install the Arduino CLI and the mbed_nano package to 
test our model.
+#
+#     .. code-block:: bash
+#
+#       %%bash
+#       pip install -q tensorflow tflite pyserial
+#       pip install -q tlcpack-nightly -f https://tlcpack.ai/wheels
+#       apt-get -qq install imagemagick curl
+#
+#       # Install Arduino CLI and library for Nano 33 BLE
+#       curl -fsSL 
https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh
+#       /content/bin/arduino-cli core update-index
+#       /content/bin/arduino-cli core install arduino:mbed_nano
+#
+# Using the GPU
+# ^^^^^^^^^^^^^
+#
+# This tutorial demonstrates training a neural network, which is requires a 
lot of computing power
+# and will go much faster if you have a GPU. If you are viewing this tutorial 
on Google Colab, you
+# can enable a GPU by going to **Runtime->Change runtime type** and selecting 
"GPU" as our hardware
+# accelerator. If you are running locally, you can `follow Tensorflow's guide 
<https://www.tensorflow.org/guide/gpu>`_ instead.
+#
+# We can test our GPU installation with the following code:
+
+import tensorflow as tf
+
+if not tf.test.gpu_device_name():
+    print("No GPU was detected!")
+    print("Model training will take much longer (~30 minutes instead of ~5)")
+else:
+    print("GPU detected - you're good to go.")
+
+######################################################################
+# Choosing Our Work Dir
+# ^^^^^^^^^^^^^^^^^^^^^
+# We need to pick a directory where our image datasets, trained model, and 
eventual Arduino sketch
+# will all live. If running on Google Colab, we'll save everything in 
``/root`` (aka ``~``) but you'll
+# probably want to store it elsewhere if running locally. Note that this 
variable only affects Python
+# scripts - you'll have to adjust the Bash commands too.
+
+import os
+
+FOLDER = "/root"
+# sphinx_gallery_start_ignore
+import tempfile
+
+FOLDER = tempfile.mkdtemp()
+# sphinx_gallery_end_ignore
+
+######################################################################
+# Downloading the Data
+# --------------------
+# Convolutional neural networks usually learn by looking at many images, along 
with labels telling
+# the network what those images are. To get these images, we'll need a 
publicly available dataset
+# with thousands of images of all sorts of objects and labels of what's in 
each image. We'll also
+# need a bunch of images that **aren't** of cars, as we're trying to 
distinguish these two classes.
+#
+# In this tutorial, we'll create a model to detect if an image contains a 
**car**, but you can use
+# whatever category you like! Just change the source URL below to one 
containing images of another
+# type of object.
+#
+# To get our car images, we'll be downloading the `Stanford Cars dataset 
<http://ai.stanford.edu/~jkrause/cars/car_dataset.html>`_,
+# which contains 16,185 full color images of cars. We'll also need images of 
random things that
+# aren't cars, so we'll use the `COCO 2017 <https://cocodataset.org/#home>`_ 
validation set (it's
+# smaller, and thus faster to download than the full training set. Training on 
the full data set
+# would yield better results). Note that there are some cars in the COCO 2017 
data set, but it's
+# a small enough fraction not to matter - just keep in mind that this will 
drive down our percieved
+# accuracy slightly.
+#
+# We could use the Tensorflow dataloader utilities, but we'll instead do it 
manually to make sure
+# it's easy to change the datasets being used. We'll end up with the following 
file hierarchy:
+#
+#     .. code-block::
+#
+#         /root
+#         ├── images
+#         │   ├── object
+#         │   │   ├── 000001.jpg
+#         │   │   │ ...
+#         │   │   └── 016185.jpg
+#         │   ├── object.tgz
+#         │   ├── random
+#         │   │   ├── 000000000139.jpg
+#         │   │   │ ...
+#         │   │   └── 000000581781.jpg
+#         │   └── random.zip
+#
+# We should also note that Stanford cars has 8k images, while the COCO 2017 
validation set is 5k
+# images - it is not a 50/50 split! If we wanted to, we could weight these 
classes differently
+# during training to correct for this, but training will still work if we 
ignore it. It should
+# take about **2 minutes** to download the Stanford Cars, while COCO 2017 
validation will take
+# **1 minute**.
+
+import os
+import shutil
+import urllib.request
+
+# Download datasets
+os.makedirs(f"{FOLDER}/images")
+urllib.request.urlretrieve(
+    "http://ai.stanford.edu/~jkrause/car196/cars_train.tgz";, 
f"{FOLDER}/images/target.tgz"
+)
+urllib.request.urlretrieve(
+    "http://images.cocodataset.org/zips/val2017.zip";, 
f"{FOLDER}/images/random.zip"
+)
+
+# Extract them and rename their folders
+shutil.unpack_archive(f"{FOLDER}/images/target.tgz", f"{FOLDER}/images")
+shutil.unpack_archive(f"{FOLDER}/images/random.zip", f"{FOLDER}/images")
+shutil.move(f"{FOLDER}/images/cars_train", f"{FOLDER}/images/target")
+shutil.move(f"{FOLDER}/images/val2017", f"{FOLDER}/images/random")
+
+######################################################################
+# Loading the Data
+# ----------------
+# Currently, our data is stored on-disk as JPG files of various sizes. To 
train with it, we'll have
+# to load the images into memory, resize them to be 64x64, and convert them to 
raw, uncompressed
+# data. Keras's ``image_dataset_from_directory`` will take care of most of 
this, though it loads
+# images such that each pixel value is a float from 0 to 255.
+#
+# We'll also need to load labels, though Keras will help with this. From our 
subdirectory structure,
+# it knows the images in ``/objects`` are one class, and those in ``/random`` 
another. Setting
+# ``label_mode='categorical'`` tells Keras to convert these into **categorical 
labels** - a 2x1 vector
+# that's either ``[1, 0]`` for an object of our target class, or ``[0, 1]`` 
vector for anything else.
+# We'll also set ``shuffle=True`` to randomize the order of our examples.
+#
+# We will also **batch** the data - grouping samples into clumps to make our 
training go faster.
+# Setting ``batch_size = 32`` is a decent number.
+#
+# Lastly, in machine learning we generally want our inputs to be small 
numbers. We'll thus use a
+# ``Rescaling`` layer to change our images such that each pixel is a float 
between ``0.0`` and ``1.0``,
+# instead of ``0`` to ``255``. We need to be careful not to rescale our 
categorical labels though, so
+# we'll use a ``lambda`` function.
+
+IMAGE_SIZE = (64, 64, 3)
+unscaled_dataset = tf.keras.utils.image_dataset_from_directory(
+    f"{FOLDER}/images",
+    batch_size=32,
+    shuffle=True,
+    label_mode="categorical",
+    image_size=IMAGE_SIZE[0:2],
+)
+rescale = tf.keras.layers.Rescaling(scale=1.0 / 255)
+full_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl))
+
+######################################################################
+# What's Inside Our Dataset?
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Before giving this data set to our neural network, we ought to give it a 
quick visual inspection.
+# Does the data look properly transformed? Do the labels seem appropriate? And 
what's our ratio of
+# objects to other stuff? We can display some examples from our datasets using 
``matplotlib``:
+
+import matplotlib.pyplot as plt
+
+num_target_class = len(os.listdir(f"{FOLDER}/images/target/"))
+num_random_class = len(os.listdir(f"{FOLDER}/images/random/"))
+print(f"{FOLDER}/images/target contains {num_target_class} images")
+print(f"{FOLDER}/images/random contains {num_random_class} images")
+
+# Show some samples and their labels
+SAMPLES_TO_SHOW = 10
+plt.figure(figsize=(20, 10))
+for i, (image, label) in enumerate(unscaled_dataset.unbatch()):
+    if i >= SAMPLES_TO_SHOW:
+        break
+    ax = plt.subplot(1, SAMPLES_TO_SHOW, i + 1)
+    plt.imshow(image.numpy().astype("uint8"))
+    plt.title(list(label.numpy()))
+    plt.axis("off")
+
+######################################################################
+# Validating our Accuracy
+# ^^^^^^^^^^^^^^^^^^^^^^^
+# While developing our model, we'll often want to check how accurate it is 
(e.g. to see if it
+# improves during training). How do we do this? We could just train it on 
*all* of the data, and
+# then ask it to classify that same data. However, our model could cheat by 
just memorizing all of
+# the samples, which would make it *appear* to have very high accuracy, but 
perform very badly in
+# reality. In practice, this "memorizing" is called **overfitting**.
+#
+# To prevent this, we will set aside some of the data (we'll use 20%) as a 
**validation set**. Our
+# model will never be trained on validation data - we'll only use it to check 
our model's accuracy.
+
+num_batches = len(full_dataset)
+train_dataset = full_dataset.take(int(num_batches * 0.8))
+validation_dataset = full_dataset.skip(len(train_dataset))
+
+######################################################################
+# Loading the Data
+# ----------------
+# In the past decade, `convolutional neural networks 
<https://en.wikipedia.org/wiki/Convolutional_neural_network>`_ have been widely
+# adopted for image classification tasks. State-of-the-art models like 
`EfficientNet V2 <https://arxiv.org/abs/2104.00298>`_ are able
+# to perform image classification better than even humans! Unfortunately, 
these models have tens of
+# millions of parameters, and thus won't fit on cheap security camera 
computers.
+#
+# Our applications generally don't need perfect accuracy - 90% is good enough. 
We can thus use the
+# older and smaller MobileNet V1 architecture. But this *still* won't be small 
enough - by default,
+# MobileNet V1 with 224x224 inputs and depth 1.0 takes ~50 MB to just 
**store**. To reduce the size
+# of the model, there are three knobs we can turn. First, we can reduce the 
size of the input images
+# from 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can 
also reduce the **depth**

Review Comment:
   perhaps elaborate on "depth" here (could just link somewhere)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to