[beam] branch master updated: ML notebook formatting and text updates (#24437)

damccorm Thu, 01 Dec 2022 06:13:54 -0800

This is an automated email from the ASF dual-hosted git repository.

damccorm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/master by this push:
     new 2e71061e7b9 ML notebook formatting and text updates (#24437)
2e71061e7b9 is described below

commit 2e71061e7b9d2383cbea5531215b68e6ec0236cd
Author: Rebecca Szper <[email protected]>
AuthorDate: Thu Dec 1 06:13:40 2022 -0800

    ML notebook formatting and text updates (#24437)
    
    * merged and resolved the conflict
    
    * more copy edits to the ML notebooks
    
    * merged and resolved the conflict
    
    * more copy edits to the ML notebooks
    
    * more copy edits to the ML notebooks
    
    * more copy edits to the ML notebooks
    
    * trying to remove a section that shouldn't have been added back in
    
    * Update examples/notebooks/beam-ml/custom_remote_inference.ipynb
    
    Co-authored-by: Danny McCormick <[email protected]>
    
    * Update examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb
    
    Co-authored-by: Danny McCormick <[email protected]>
    
    * review updates
    
    Co-authored-by: Danny McCormick <[email protected]>
---
 .../beam-ml/custom_remote_inference.ipynb          | 50 +++++++------
 .../beam-ml/dataframe_api_preprocessing.ipynb      | 82 ++++++++++------------
 .../notebooks/beam-ml/run_custom_inference.ipynb   | 17 ++---
 .../beam-ml/run_inference_multi_model.ipynb        | 74 ++++++++++---------
 .../notebooks/beam-ml/run_inference_pytorch.ipynb  | 32 +++++----
 .../run_inference_pytorch_tensorflow_sklearn.ipynb | 57 +++++++--------
 .../notebooks/beam-ml/run_inference_sklearn.ipynb  | 30 ++++----
 .../beam-ml/run_inference_tensorflow.ipynb         | 42 +++++++----
 8 files changed, 197 insertions(+), 187 deletions(-)

diff --git a/examples/notebooks/beam-ml/custom_remote_inference.ipynb 
b/examples/notebooks/beam-ml/custom_remote_inference.ipynb
index 036a9d39d4e..ad25849e89e 100644
--- a/examples/notebooks/beam-ml/custom_remote_inference.ipynb
+++ b/examples/notebooks/beam-ml/custom_remote_inference.ipynb
@@ -4,6 +4,7 @@
       "cell_type": "code",
       "execution_count": null,
       "metadata": {
+        "cellView": "form",
         "id": "paYiulysGrwR"
       },
       "outputs": [],
@@ -36,15 +37,16 @@
       "source": [
         "# Remote inference in Apache Beam\n",
         "\n",
+        "This example demonstrates how to implement a custom inference call in 
Apache Beam using the Google Cloud Vision API.\n",
+        "\n",
         "The prefered way to run inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
-        "The RunInference API enables you to run your models as part of your 
pipeline in a way that is optimized for machine learning inference. \n",
+        "The RunInference API enables you to run models as part of your 
pipeline in a way that is optimized for machine learning inference. \n",
         "To reduce the number of steps that you need to take, RunInference 
supports features like batching. For more infomation about the RunInference 
API, review the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference),
 \n",
         "which demonstrates how to implement model inference in PyTorch, 
scikit-learn, and TensorFlow.\n",
         "\n",
         "Currently, the RunInference API doesn't support making remote 
inference calls using the Natural Language API, Cloud Vision API, and so on. 
\n",
-        "Therefore, to use these remote APIs with Apache Beam, you need to 
write custom inference calls.\n",
-        "\n",
-        "This notebook shows how to implement a custom inference call in 
Apache Beam. This example uses the Google Cloud Vision API."
+        "Therefore, to use these remote APIs with Apache Beam, you need to 
write custom inference calls.\n"
+        
       ]
     },
     {
@@ -53,7 +55,7 @@
         "id": "GNbarEZsalS1"
       },
       "source": [
-        "## Use case: run the Cloud Vision API\n",
+        "## Run the Cloud Vision API\n",
         "\n",
         "You can use the Cloud Vision API to retrieve labels that describe an 
image.\n",
         "For example, the following image shows a lion with possible labels."
@@ -75,20 +77,20 @@
       },
       "source": [
         "We want to run the Google Cloud Vision API on a large set of images, 
and Apache Beam is the ideal tool to handle this workflow.\n",
-        "This example notebook demonstates how to retrieve image labels with 
this API on a small set of images.\n",
+        "This example demonstates how to retrieve image labels with this API 
on a small set of images.\n",
         "\n",
-        "The notebook follows these steps to implement this workflow:\n",
+        "The example follows these steps to implement this workflow:\n",
         "* Read the images.\n",
         "* Batch the images together to optimize the model call.\n",
         "* Send the images to an external API to run inference.\n",
-        "* Post-process the results of your API.\n",
+        "* Postprocess the results of your API.\n",
         "\n",
         "**Caution:** Be aware of API quotas and the heavy load you might 
incur on your external API. Verify that your pipeline and API are configured 
correctly for your use case.\n",
         "\n",
         "To optimize the calls to the external API, limit the parallel calls 
to the external remote API by configuring 
[PipelineOptions](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options).\n",
         "In Apache Beam, different runners provide options to handle the 
parallelism, for example:\n",
-        "* With the [Direct 
Runner](https://beam.apache.org/documentation/runners/direct/), use 
`direct_num_workers`.\n",
-        "* With the [Google Cloud Dataflow 
Runner](https://beam.apache.org/documentation/runners/dataflow/), use 
`max_num_workers`.\n",
+        "* With the [Direct 
Runner](https://beam.apache.org/documentation/runners/direct/), use the 
`direct_num_workers` pipeline option.\n",
+        "* With the [Google Cloud Dataflow 
Runner](https://beam.apache.org/documentation/runners/dataflow/), use the 
`max_num_workers` pipeline option.\n",
         "\n",
         "For information about other runners, see the [Beam capability 
matrix](https://beam.apache.org/documentation/runners/capability-matrix/) "
       ]
@@ -99,7 +101,7 @@
         "id": "FAawWOaiIYaS"
       },
       "source": [
-        "## Installation\n",
+        "## Before you begin\n",
         "\n",
         "This section provides installation steps."
       ]
@@ -170,9 +172,11 @@
         "id": "mL4MaHm_XOVd"
       },
       "source": [
-        "## Remote inference on Cloud Vision API\n",
+        "## Run remote inference on Cloud Vision API\n",
+        "\n",
+        "This section demonstates the steps to run remote inference on the 
Cloud Vision API.\n",
         "\n",
-        "This section demonstates the steps to run remote inference on the 
Cloud Vision API."
+        "Download and install Apache Beam and the required modules."
       ]
     },
     {
@@ -199,7 +203,7 @@
         "id": "09k08IYlLmON"
       },
       "source": [
-        "For this example, we use images from the [MSCoco 
dataset](https://cocodataset.org/#explore) as a list of image urls.\n",
+        "This example uses images from the [MSCoco 
dataset](https://cocodataset.org/#explore) as a list of image URLs.\n",
         "This data is used as the pipeline input."
       ]
     },
@@ -234,20 +238,20 @@
         "id": "HLy7VKJhLrmT"
       },
       "source": [
-        "### Custom DoFn\n",
+        "### Create a custom DoFn\n",
         "\n",
         "In order to implement remote inference, create a DoFn class. This 
class sends a batch of images to the Cloud vision API.\n",
         "\n",
         "The custom DoFn makes it possible to initialize the API. In case of a 
custom model, a model can also be loaded in the `setup` function. \n",
         "\n",
-        "The `process` function is the most interesting part. In this function 
we implement the model call and return its results.\n",
+        "The `process` function is the most interesting part. In this 
function, we implement the model call and return its results.\n",
         "\n",
-        "**Caution:** When running remote inference, prepare to encounter, 
identify, and handle failure as gracefully as possible. We recommend using the 
following techniques: \n",
+        "When running remote inference, prepare to encounter, identify, and 
handle failure as gracefully as possible. We recommend using the following 
techniques: \n",
         "\n",
         "* **Exponential backoff:** Retry failed remote calls with 
exponentially growing pauses between retries. Using exponential backoff ensures 
that failures don't lead to an overwhelming number of retries in quick 
succession. \n",
         "\n",
-        "* **Dead letter queues:** Route failed inferences to a separate 
`PCollection` without failing the whole transform. You can continue execution 
without failing the job (batch jobs' default behavior) or retrying indefinitely 
(streaming jobs' default behavior).\n",
-        "You can then run custom pipeline logic on the deadletter queue to log 
the failure, alert, and push the failed message to temporary storage so that it 
can eventually be reprocessed. "
+        "* **Dead-letter queues:** Route failed inferences to a separate 
`PCollection` without failing the whole transform. You can continue execution 
without failing the job (batch jobs' default behavior) or retrying indefinitely 
(streaming jobs' default behavior).\n",
+        "You can then run custom pipeline logic on the dead-letter queue 
(unprocessed messages queue) to log the failure, alert, and push the failed 
message to temporary storage so that it can eventually be reprocessed."
       ]
     },
     {
@@ -277,7 +281,7 @@
         "    image_requests = [vision.AnnotateImageRequest(image=image, 
features=[feature]) for image in images]\n",
         "    batch_image_request = 
vision.BatchAnnotateImagesRequest(requests=image_requests)\n",
         "\n",
-        "    # Send batch request to the remote endpoint.\n",
+        "    # Send the batch request to the remote endpoint.\n",
         "    responses = 
self._client.batch_annotate_images(request=batch_image_request).responses\n",
         "    \n",
         "    return list(zip(image_urls, responses))\n"
@@ -289,7 +293,7 @@
         "id": "lHJuyHhvL0-a"
       },
       "source": [
-        "### Batching\n",
+        "### Manage batching\n",
         "\n",
         "Before we can chain together the pipeline steps, we need to 
understand batching.\n",
         "When running inference with your model, either in Apache Beam or in 
an external API, you can batch your input to increase the efficiency of the 
model execution.\n",
@@ -297,7 +301,7 @@
         "\n",
         "To manage the batching in this pipeline, include a `BatchElements` 
transform to group elements together and form a batch of the desired size.\n",
         "\n",
-        "* If you have a streaming pipeline, consider using 
[GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/)\n",
+        "* If you have a streaming pipeline, consider using 
[GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/),\n",
         "because `BatchElements` doesn't batch items across bundles. 
`GroupIntoBatches` requires choosing a key within which items are batched.\n",
         "\n",
         "* When batching, make sure that the input batch matches the maximum 
payload of the external API.  \n",
@@ -619,7 +623,7 @@
         "id": "7gwn5bF1XaDm"
       },
       "source": [
-        "### Metrics\n",
+        "## Monitor the pipeline\n",
         "\n",
         "Because monitoring can provide insight into the status and health of 
the application, consider monitoring and measuring pipeline performance.\n",
         "For information about the available tracking metrics, see 
[RunInference 
Metrics](https://beam.apache.org/documentation/ml/runinference-metrics/)."
diff --git a/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb 
b/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb
index 645d62d32be..e45f1bd2d39 100644
--- a/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb
+++ b/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb
@@ -38,29 +38,23 @@
         "\n",
         "For rapid execution, Pandas loads all of the data into memory on a 
single machine (one node). This configuration works well when dealing with 
small-scale datasets. However, many projects involve datasets that are too big 
to fit in memory. These use cases generally require parallel data processing 
frameworks, such as Apache Beam.\n",
         "\n",
-        "\n",
-        "## Apache Beam DataFrames\n",
-        "\n",
-        "\n",
-        "Beam DataFrames provide a pandas-like\n",
+        "Beam DataFrames provide a Pandas-like\n",
         "API to declare and define Beam processing pipelines. It provides a 
familiar interface for machine learning practioners to build complex 
data-processing pipelines by only invoking standard pandas commands.\n",
         "\n",
         "To learn more about Apache Beam DataFrames, see the\n",
         "[Beam DataFrames 
overview](https://beam.apache.org/documentation/dsls/dataframes/overview) 
page.\n",
         "\n",
-        "## Goal\n",
-        "The goal of this notebook is to explore a dataset preprocessed with 
the Beam DataFrame API for machine learning model training.\n",
+        "## Overview\n",
+        "The goal of this example is to explore a dataset preprocessed with 
the Beam DataFrame API for machine learning model training.\n",
         "\n",
-        "\n",
-        "## Tutorial outline\n",
-        "\n",
-        "This notebook demonstrates the use of the Apache Beam DataFrames API 
to perform common data exploration as well as the preprocessing steps that are 
necessary to prepare your dataset for machine learning model training and 
inference. These steps include the following:  \n",
+        "This example demonstrates the use of the Apache Beam DataFrames API 
to perform common data exploration as well as the preprocessing steps that are 
necessary to prepare your dataset for machine learning model training and 
inference. This example includes the following steps:  \n",
         "\n",
         "*   Removing unwanted columns.\n",
         "*   One-hot encoding categorical columns.\n",
         "*   Normalizing numerical columns.\n",
         "\n",
-        "\n"
+        "In this example, the first section demonstrates how to build and 
execute a pipeline locally using the interactive runner.\n",
+        "The second section uses a distributed runner to demonstrate how to 
run the pipeline on the full dataset.\n"
       ],
       "metadata": {
         "id": "iFZC1inKuUCy"
@@ -69,9 +63,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Installation\n",
+        "## Install Apache Beam\n",
         "\n",
-        "To explore the elements within a `PCollection`, install Apache Beam 
with the `interactive` component to use the Interactive runner. The latest 
implemented DataFrames API methods invoked in this notebook are available in 
Apache Beam SDK versions 2.43 and later.\n"
+        "To explore the elements within a `PCollection`, install Apache Beam 
with the `interactive` component to use the Interactive runner. The DataFrames 
API methods invoked in this example are available in Apache Beam SDK versions 
2.43 and later.\n"
       ],
       "metadata": {
         "id": "A0f2HJ22D4lt"
@@ -105,8 +99,8 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Part I : Local exploration with the Interactive Beam runner\n",
-        "Start by using the [Interactive 
Beam](https://beam.apache.org/releases/pydoc/2.20.0/apache_beam.runners.interactive.interactive_beam.html)
 to explore and develop your pipeline.\n",
+        "## Local exploration with the Interactive Beam runner\n",
+        "Use the [Interactive 
Beam](https://beam.apache.org/releases/pydoc/2.20.0/apache_beam.runners.interactive.interactive_beam.html)
 runner to explore and develop your pipeline.\n",
         "This runner allows you to test the code interactively, progressively 
building out the pipeline before deploying it on a distributed runner. \n",
         "\n",
         "\n",
@@ -124,12 +118,12 @@
       "source": [
         "### Load the data\n",
         "\n",
-        "To read CSV files into Dataframes, Pandas has the\n",
+        "To read CSV files into DataFrames, Pandas has the\n",
         
"[`pandas.read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)\n",
         "function.\n",
         "This notebook uses the Beam\n",
         
"[`beam.dataframe.io.read_csv`](https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_csv)\n",
-        "function, which emulates `pandas.read_csv`. The main difference is 
that the Beam function returns a deferred Beam DataFrame whereas the Pandas 
function returns a standard DataFrame.\n"
+        "function, which emulates `pandas.read_csv`. The main difference is 
that the Beam function returns a deferred Beam DataFrame, whereas the Pandas 
function returns a standard DataFrame.\n"
       ]
     },
     {
@@ -170,8 +164,8 @@
         "### Preprocess the data\n",
         "\n",
         "This example uses the [NASA - Nearest Earth Objects 
dataset](https://cneos.jpl.nasa.gov/ca/).\n",
-        "This dataset includes information about objects in the outer space. 
Some objects are close enough to Earth to cause harm.\n",
-        "Therefore, this dataset compiles the list of NASA certified asteroids 
that are classified as the nearest earth objects to understand which objects 
pose a risk."
+        "This dataset includes information about objects in outer space. Some 
objects are close enough to Earth to cause harm.\n",
+        "This dataset compiles the list of NASA certified asteroids that are 
classified as the nearest earth objects to understand which objects pose a 
risk."
       ]
     },
     {
@@ -673,7 +667,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Use the standard pandas command `DataFrame.describe()` to generate 
descriptive statistics for the numerical columns like percentile, mean, std, 
and so on. "
+        "Use the standard pandas command `DataFrame.describe()` to generate 
descriptive statistics for the numerical columns, such as percentile, mean, 
std, and so on. "
       ],
       "metadata": {
         "id": "MGAErO0lAYws"
@@ -1006,16 +1000,16 @@
       "source": [
         "Before running any transformations, verify that all of the columns 
need to be used for model training. Start by looking at the column description 
provided by the [JPL website](https://ssd.jpl.nasa.gov/sbdb_query.cgi):\n",
         "\n",
-        "* **spk_id:** Object primary SPK-ID\n",
-        "* **full_name:** Asteroid name\n",
-        "* **near_earth_object:** Near-earth object flag\n",
+        "* **spk_id:** Object primary SPK-ID.\n",
+        "* **full_name:** Asteroid name.\n",
+        "* **near_earth_object:** Near-earth object flag.\n",
         "* **absolute_magnitude:** The apparent magnitude an object would have 
if it were located at a distance of 10 parsecs.\n",
         "* **diameter:** Object diameter (from equivalent sphere) km unit.\n",
-        "* **albedo:** A measure of the diffuse reflection of solar radiation 
out of the total solar radiation and measured on a scale from 0 to 1.\n",
+        "* **albedo:** A measure of the diffuse reflection of solar radiation 
out of the total solar radiation, measured on a scale from 0 to 1.\n",
         "* **diameter_sigma:** 1-sigma uncertainty in object diameter km 
unit.\n",
-        "* **eccentricity:** A value between 0 and 1 that refers to how flat 
or round the asteroid is  \n",
-        "* **inclination:** The angle with respect to the x-y ecliptic 
plane\n",
-        "* **moid_ld:** Earth Minimum Orbit Intersection Distance au unit\n",
+        "* **eccentricity:** A value between 0 and 1 that refers to how flat 
or round the asteroid is.\n",
+        "* **inclination:** The angle with respect to the x-y ecliptic 
plane.\n",
+        "* **moid_ld:** Earth Minimum Orbit Intersection Distance au unit.\n",
         "* **object_class:** The classification of the asteroid. For a more 
detailed description, see [NASA object 
classifications](https://pdssbn.astro.umd.edu/data_other/objclass.shtml).\n",
         "* **Semi-major axis au Unit:** The length of half of the long axis in 
AU unit.\n",
         "* **hazardous_flag:** Identifies hazardous asteroids."
@@ -1027,7 +1021,7 @@
         "id": "DzYVKbwTp72d"
       },
       "source": [
-        "The **'spk_id'** and **'full_name'** columns are unique for each row. 
You can remove these columns, because they are not needed for model training."
+        "The **spk_id** and **full_name** columns are unique for each row. You 
can remove these columns, because they are not needed for model training."
       ]
     },
     {
@@ -1153,7 +1147,7 @@
         "id": "00MRdFGLwQiD"
       },
       "source": [
-        "Most of the columns do not have missing values. However, the columns 
**'diameter'**, **'albedo'** and **'diameter_sigma'** have many missing values. 
Because these values cannot be measured or derived and aren't needed for 
training the ML model, remove the columns."
+        "Most of the columns do not have missing values. However, the columns 
**diameter**, **albedo**, and **diameter_sigma** have many missing values. 
Because these values cannot be measured or derived and aren't needed for 
training the ML model, remove the columns."
       ]
     },
     {
@@ -1511,7 +1505,7 @@
         "id": "a3PojL3WBqgE"
       },
       "source": [
-        "Next, normalize the numerical columns so that they can be used to 
train a model. To standarize the data, you can subtract the mean and divide by 
the standard deviation. This process is also known as finding the 
[z-score](https://developers.google.com/machine-learning/data-prep/transform/normalization#z-score).\n",
+        "Normalize the numerical columns so that they can be used to train a 
model. To standarize the data, you can subtract the mean and divide by the 
standard deviation. This process is also known as finding the 
[z-score](https://developers.google.com/machine-learning/data-prep/transform/normalization#z-score).\n",
         "This step improves the performance and training stability of the 
model during training and inference.\n"
       ]
     },
@@ -1859,7 +1853,7 @@
         "id": "qdNILsajFvex"
       },
       "source": [
-        "Convert the categorical columns into one-hot encoded variables to use 
them during training.\n"
+        "Next, convert the categorical columns into one-hot encoded variables 
to use during training.\n"
       ]
     },
     {
@@ -2596,7 +2590,7 @@
         "\n",
         "This section combines the previous steps into a full pipeline 
implementation, and then visualizes the preprocessed data.\n",
         "\n",
-        "Note that the only standard Apache Beam method invoked here is the 
`pipeline` instance. The rest of the preprocessing commands are based on native 
Pandas methods that are integrated with the Apache Beam DataFrame API."
+        "Note that the only standard Apache Beam method invoked here is the 
`pipeline` instance. The rest of the preprocessing commands are based on native 
pandas methods that are integrated with the Apache Beam DataFrame API."
       ]
     },
     {
@@ -3339,7 +3333,7 @@
         "id": "xZvJTqa3XKI_"
       },
       "source": [
-        "## Part II : Process the full dataset with the distributed runner\n",
+        "## Process the full dataset with the distributed runner\n",
         "The previous section demonstrates how to build and execute the 
pipeline locally using the interactive runner.\n",
         "This section demonstrates how to run the pipeline on the full dataset 
by switching to a distributed runner. For this example, the pipeline runs on 
[Dataflow](https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline)."
       ]
@@ -3361,7 +3355,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "These steps process the full dataset, `full.csv`, which contains 
approximately one million rows. To materialize the deferred dataframe, these 
steps also write the results to a CSV file instead of using `ib.collect()`.\n",
+        "These steps process the full dataset, `full.csv`, which contains 
approximately one million rows. To materialize the deferred DataFrame, these 
steps also write the results to a CSV file instead of using `ib.collect()`.\n",
         "\n",
         "To switch from an interactive runner to a distributed runner, update 
the pipeline options. The rest of the pipeline steps don't change."
       ],
@@ -3450,12 +3444,10 @@
         "\n",
         "This tutorial demonstrated how to analyze and preprocess a 
large-scale dataset with the Apache Beam DataFrames API. You can now train a 
model on a classification task using the preprocessed dataset.\n",
         "\n",
-        "To learn more about how to get started with classifying structured 
data, see:\n",
-        "\n",
-        "*   [Structred data classification from 
scratch](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/)\n",
+        "To learn more about how to get started with classifying structured 
data, see \n",
+        "[Structured data classification from 
scratch](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/).\n",
         "\n",
-        "To continue learning, find another dataset to use with the Apache 
Beam DataFrames API processing. Think carefully about which features to include 
in your model and how to represent them.\n",
-        "\n"
+        "To continue learning, find another dataset to use with the Apache 
Beam DataFrames API processing. Think carefully about which features to include 
in your model and how to represent them.\n"
       ],
       "metadata": {
         "id": "UOLr6YgOOSVQ"
@@ -3466,11 +3458,11 @@
       "source": [
         "## Resources\n",
         "\n",
-        "* [Beam DataFrames 
overview](https://beam.apache.org/documentation/dsls/dataframes/overview) -- An 
overview of the Apache Beam DataFrames API.\n",
-        "* [Differences from 
pandas](https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas)
 -- Reviews the differences between Apache Beam DataFrames and Pandas 
DataFrames, as well as some of the workarounds for unsupported operations.\n",
-        "* [10 minutes to 
Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html) -- 
A quickstart guide to the Pandas DataFrames.\n",
-        "* [Pandas DataFrame 
API](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) -- The 
API reference for the Pandas DataFrames.\n",
-        "* [Data preparation and feature training in 
ML](https://developers.google.com/machine-learning/data-prep) -- A guideline 
about data transformation for ML training."
+        "* [Beam DataFrames 
overview](https://beam.apache.org/documentation/dsls/dataframes/overview) - An 
overview of the Apache Beam DataFrames API.\n",
+        "* [Differences from 
pandas](https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas)
 - Reviews the differences between Apache Beam DataFrames and Pandas 
DataFrames, as well as some of the workarounds for unsupported operations.\n",
+        "* [10 minutes to 
Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html) - A 
quickstart guide to the Pandas DataFrames.\n",
+        "* [Pandas DataFrame 
API](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) - The 
API reference for the Pandas DataFrames.\n",
+        "* [Data preparation and feature training in 
ML](https://developers.google.com/machine-learning/data-prep) - A guideline 
about data transformation for ML training."
       ],
       "metadata": {
         "id": "nG9WXXVcMCe_"
diff --git a/examples/notebooks/beam-ml/run_custom_inference.ipynb 
b/examples/notebooks/beam-ml/run_custom_inference.ipynb
index 9d57bf9f475..c45405204d2 100644
--- a/examples/notebooks/beam-ml/run_custom_inference.ipynb
+++ b/examples/notebooks/beam-ml/run_custom_inference.ipynb
@@ -5,6 +5,7 @@
       "execution_count": 1,
       "id": "C1rAsD2L-hSO",
       "metadata": {
+        "cellView": "form",
         "id": "C1rAsD2L-hSO"
       },
       "outputs": [],
@@ -41,9 +42,10 @@
         "This notebook demonstrates how to run inference on your custom 
framework using the\n",
         
"[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 class.\n",
         "\n",
-        "Named-Entity Recognition (NER) is one of the most common tasks for 
natural language processing (NLP). \n",
-        "NLP locates and named entities in unstructured text and classifies 
the entities using pre-defined labels, such as person name, organization, date, 
and so on.\n",
-        "This example illustrates how to use the popular `spaCy` package to 
load an ML model and perform inference in an Apache Beam pipeline using the 
RunInference `PTransform`.\n",
+        "Named-entity recognition (NER) is one of the most common tasks for 
natural language processing (NLP). \n",
+        "NLP locates named entities in unstructured text and classifies the 
entities using pre-defined labels, such as person name, organization, date, and 
so on.\n",
+        "\n",
+        "This example illustrates how to use the popular `spaCy` package to 
load a machine learning (ML) model and perform inference in an Apache Beam 
pipeline using the RunInference `PTransform`.\n",
         "For more information about the RunInference API, see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) 
in the Apache Beam documentation."
       ]
     },
@@ -58,7 +60,7 @@
         "\n",
         "The RunInference library is available in Apache Beam versions 2.40 
and later.\n",
         "\n",
-        "For this example, you need to install `spaCy` and `pandas`. A small 
NER model (`en_core_web_sm`) is also installed, but you can use any valid 
`spaCy` model."
+        "For this example, you need to install `spaCy` and `pandas`. A small 
NER model, `en_core_web_sm`, is also installed, but you can use any valid 
`spaCy` model."
       ]
     },
     {
@@ -84,7 +86,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Learn more about `spaCy`\n",
+        "## Learn about `spaCy`\n",
         "\n",
         "To learn more about `spaCy`, create a `spaCy` language object in 
memory using `spaCy`'s trained models.\n",
         "You can install these models as Python packages.\n",
@@ -242,9 +244,9 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Create a`ModelHandler` to use `spaCy` for inference\n",
+        "## Create a model handler\n",
         "\n",
-        "This section demonstrates how to create your own `ModelHandler`."
+        "This section demonstrates how to create your own `ModelHandler` so 
that you can use `spaCy` for inference."
       ]
     },
     {
@@ -420,7 +422,6 @@
         "    | \"CreateSentences\" >> beam.Create(text_strings_with_keys)\n",
         "    | \"RunInferenceSpacy\" >> 
RunInference(keyed_spacy_model_handler)\n",
         "    # Generate a schema suitable for conversion to a dataframe using 
Map to Row objects.\n",
-        "    # to a dataframe.\n",
         "    | 'ToRows' >> beam.Map(lambda row: beam.Row(key=row[0], 
text=row[1][0], predictions=row[1][1]))\n",
         "    )"
       ]
diff --git a/examples/notebooks/beam-ml/run_inference_multi_model.ipynb 
b/examples/notebooks/beam-ml/run_inference_multi_model.ipynb
index a1e52b23546..cabe60e7a3a 100644
--- a/examples/notebooks/beam-ml/run_inference_multi_model.ipynb
+++ b/examples/notebooks/beam-ml/run_inference_multi_model.ipynb
@@ -71,7 +71,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Use case: Image captioning with cascade models "
+        "## Image captioning with cascade models"
       ],
       "metadata": {
         "id": "i1uyzlj3s3e_"
@@ -80,12 +80,12 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Image captioning has various applications, such as image indexing for 
information retreival, virtual assistant training, and various natural language 
processing applications.\n",
+        "Image captioning has various applications, such as image indexing for 
information retrieval, virtual assistant training, and natural language 
processing.\n",
         "\n",
         "This example shows how to generate captions on a a large set of 
images. Apache Beam is the ideal tool to handle this workflow. We use two 
models for this task:\n",
         "\n",
-        "* [BLIP](https://github.com/salesforce/BLIP): Used to generate a set 
of candidate captions for a given image. \n",
-        "* [CLIP](https://github.com/openai/CLIP): Used to rank the generated 
captions based on accuracy."
+        "* [BLIP](https://github.com/salesforce/BLIP): Generates a set of 
candidate captions for a given image. \n",
+        "* [CLIP](https://github.com/openai/CLIP): Ranks the generated 
captions based on accuracy."
       ],
       "metadata": {
         "id": "cP1sBhNacS8b"
@@ -106,14 +106,14 @@
         "The steps to build this pipeline are as follows:\n",
         "* Read the images.\n",
         "* Preprocess the images for caption generation for inference with the 
BLIP model.\n",
-        "* Inference with BLIP to generate a list of caption candidates.\n",
+        "* Run inference with BLIP to generate a list of caption 
candidates.\n",
         "* Aggregate the generated captions with their source image.\n",
-        "* Preprocess the aggregated image-caption pair to rank them with 
CLIP.\n",
-        "* Inference with CLIP to generate the caption ranking. \n",
+        "* Preprocess the aggregated image-caption pairs to rank them with 
CLIP.\n",
+        "* Run inference with CLIP to generate the caption ranking. \n",
         "* Print the image names and the captions sorted according to their 
ranking.\n",
         "\n",
         "\n",
-        "The following diagram illustrates the steps in the inference 
pipelines used in this notebook:"
+        "The following diagram illustrates the steps in the inference 
pipelines used in this notebook."
       ],
       "metadata": {
         "id": "lBPfy-bYgLuD"
@@ -284,7 +284,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### CLIP\n",
+        "### Install CLIP dependencies\n",
         "\n",
         "Download and install the CLIP dependencies."
       ],
@@ -343,7 +343,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### BLIP\n",
+        "### Install BLIP dependencies\n",
         "\n",
         "Download and install the BLIP dependencies."
       ],
@@ -417,7 +417,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### I/O helper functions\n",
+        "### Install I/O helper functions\n",
         "\n",
         "Download and install the dependencies for the I/O helper functions."
       ],
@@ -430,7 +430,7 @@
       "source": [
         "class ReadImagesFromUrl(beam.DoFn):\n",
         "  \"\"\"\n",
-        "  Read an image from a given url and return a tuple of the 
images_url\n",
+        "  Read an image from a given URL and return a tuple of the 
images_url\n",
         "  and image data.\n",
         "  \"\"\"\n",
         "  def process(self, element: str) -> Tuple[str, Image.Image]:\n",
@@ -441,7 +441,7 @@
         "\n",
         "class FormatCaptions(beam.DoFn):\n",
         "  \"\"\"\n",
-        "  Print the image name and it's most relevant captions after CLIP 
ranking.\n",
+        "  Print the image name and its most relevant captions after CLIP 
ranking.\n",
         "  \"\"\"\n",
         "  def __init__(self, number_of_top_captions: int):\n",
         "    self._number_of_top_captions = number_of_top_captions\n",
@@ -474,10 +474,10 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Define the preprocessing and postprocessing function for each of the 
models.\n",
+        "Define the preprocessing and postprocessing functions for each of the 
models.\n",
         "\n",
         "To prepare the instance for processing bundles of elements by 
initializing and to cache the processing transform resources, use 
`DoFn.setup()`.\n",
-        "This step avoids unnecessary re-initializations on every invocation 
to the processing method."
+        "This step avoids unnecessary re-initializations on every invocation 
of the processing method."
       ],
       "metadata": {
         "id": "wEViP715fes4"
@@ -486,8 +486,8 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### BLIP\n",
-        "Define the preprocessing and postprocessing function for BLIP."
+        "### Define BLIP functions\n",
+        "Define the preprocessing and postprocessing functions for BLIP."
       ],
       "metadata": {
         "id": "X1UGv6bbyNxY"
@@ -499,7 +499,7 @@
         "class PreprocessBLIPInput(beam.DoFn):\n",
         "\n",
         "  \"\"\"\n",
-        "  Process the raw image input to a format suitable for BLIP 
Inference. The processed\n",
+        "  Process the raw image input to a format suitable for BLIP 
inference. The processed\n",
         "  images are duplicated to the number of desired captions per image. 
\n",
         "\n",
         "  Preprocessing transformation taken from: \n",
@@ -520,7 +520,7 @@
         "\n",
         "  def process(self, element):\n",
         "    image_url, image = element \n",
-        "    # Update this step when this ticket is resolved: 
https://github.com/apache/beam/issues/21863\n";,
+        "    # The following lines provide a workaround to turn off 
BatchElements.\n",
         "    preprocessed_img = self._transform(image).unsqueeze(0)\n",
         "    preprocessed_img = 
preprocessed_img.repeat(self._captions_per_image, 1, 1, 1)\n",
         "    # Parse the processed input to a dictionary to a format suitable 
for RunInference.\n",
@@ -546,9 +546,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### CLIP \n",
+        "### Define CLIP functions \n",
         "\n",
-        "Define the preprocessing and postprocessing function for CLIP."
+        "Define the preprocessing and postprocessing functions for CLIP."
       ],
       "metadata": {
         "id": "EZHfa1KzWWDI"
@@ -642,8 +642,12 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Note that we use a `KeyedModelHandler` for both models to attach a 
key to the general `ModelHandler`.\n",
-        "The key is used to keep a reference to the image that the inference 
is associated with and is used in the postprocessing steps.\n",
+        "Use a `KeyedModelHandler` for both models to attach a key to the 
general `ModelHandler`.\n",
+        "The key is used for the following purposes:\n",
+        "* To keep a reference to the image that the inference is associated 
with.\n",
+        "* To aggregate transforms of different inputs.\n",
+        "* To run postprocessing steps correctly.\n",
+        "\n",
         "In this example, we use the `image_url` as the key."
       ],
       "metadata": {
@@ -655,13 +659,13 @@
       "source": [
         "class 
PytorchNoBatchModelHandlerKeyedTensor(PytorchModelHandlerKeyedTensor):\n",
         "      \"\"\"Wrapper to PytorchModelHandler to limit batch size to 
1.\n",
-        "    The caption strings generated from BLIP tokenizer may have 
different\n",
-        "    lengths, which doesn't work with torch.stack() in current 
RunInference\n",
-        "    implementation since stack() requires tensors to be the same 
size.\n",
+        "    The caption strings generated from the BLIP tokenizer might have 
different\n",
+        "    lengths. Different length strings don't work with torch.stack() 
in the current RunInference\n",
+        "    implementation, because stack() requires tensors to be the same 
size.\n",
         "    Restricting max_batch_size to 1 means there is only 1 example per 
`batch`\n",
         "    in the run_inference() call.\n",
         "    \"\"\"\n",
-        "    # Update this step when this ticket is resolved: 
https://github.com/apache/beam/issues/21863\n";,
+        "      # The following lines provide a workaround to turn off 
BatchElements.\n",
         "      def batch_elements_kwargs(self):\n",
         "          return {'max_batch_size': 1}"
       ],
@@ -683,7 +687,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## BLIP\n",
+        "## Generate captions with BLIP\n",
         "\n",
         "Use BLIP to generate a set of candidate captions for a given image."
       ],
@@ -711,7 +715,7 @@
       "source": [
         "class BLIPWrapper(torch.nn.Module):\n",
         "  \"\"\"\n",
-        "   Wrapper around the BLIP model to overwrite the default \"forward\" 
method with the \"generate\" since BLIP uses the \n",
+        "   Wrapper around the BLIP model to overwrite the default \"forward\" 
method with the \"generate\" method, because BLIP uses the \n",
         "  \"generate\" method to produce the image captions.\n",
         "  \"\"\"\n",
         "  \n",
@@ -725,7 +729,7 @@
         "\n",
         "  def forward(self, inputs: torch.Tensor):\n",
         "    # Squeeze because RunInference adds an extra dimension, which is 
empty.\n",
-        "    # Update this step when this ticket is resolved: 
https://github.com/apache/beam/issues/21863\n";,
+        "    # The following lines provide a workaround to turn off 
BatchElements.\n",
         "    inputs = inputs.squeeze(0)\n",
         "    captions = self._model.generate(inputs,\n",
         "                                    sample=True,\n",
@@ -756,7 +760,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## CLIP\n",
+        "## Rank captions with CLIP\n",
         "\n",
         "Use CLIP to rank the generated captions based on the accuracy with 
which they represent the image."
       ],
@@ -771,7 +775,7 @@
         "\n",
         "  def forward(self, **kwargs: Dict[str, torch.Tensor]):\n",
         "    # Squeeze because RunInference adds an extra dimension, which is 
empty.\n",
-        "    # Update this step when this ticket is resolved: 
https://github.com/apache/beam/issues/21863.\n";,
+        "    # The following lines provide a workaround to turn off 
BatchElements.\n",
         "    kwargs = {key: tensor.squeeze(0) for key, tensor in 
kwargs.items()}\n",
         "    output = super().forward(**kwargs)\n",
         "    logits = output.logits_per_image\n",
@@ -888,7 +892,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Initialize pipeline run parameters\n",
+        "## Initialize the pipeline run parameters\n",
         "\n",
         "Specify the number of captions generated per image and the number of 
captions to display with each image."
       ],
@@ -914,7 +918,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Run pipeline"
+        "## Run the pipeline"
       ],
       "metadata": {
         "id": "5T9Pcdp7oNb8"
@@ -923,7 +927,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "This example uses raw images from the `read_images` pipeline as 
inputs for both models, because each model needs to preprocess the raw images 
differently. They require a different embedding representation for image 
captioning and image-captions pair ranking.\n",
+        "This example uses raw images from the `read_images` pipeline as 
inputs for both models. Each model needs to preprocess the raw images 
differently, because they require a different embedding representation for 
image captioning and for image-captions pair ranking.\n",
         "\n",
         "To aggregate the raw images with the generated caption by their key 
(the image URL), this example uses `CoGroupByKey`. This process produces a 
tuple of image-captions pairs that is then passed to the CLIP transform and 
used for ranking."
       ],
diff --git a/examples/notebooks/beam-ml/run_inference_pytorch.ipynb 
b/examples/notebooks/beam-ml/run_inference_pytorch.ipynb
index 3afc6bad989..d0a350982f4 100644
--- a/examples/notebooks/beam-ml/run_inference_pytorch.ipynb
+++ b/examples/notebooks/beam-ml/run_inference_pytorch.ipynb
@@ -54,7 +54,7 @@
         "This notebook demonstrates the use of the RunInference transform for 
PyTorch. Apache Beam includes implementations of the 
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 class for [users of 
PyTorch](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.pytorch_inference.html).
 For more information about the RunInference API, see [Machine 
Learning](https://beam.apache.or [...]
         "\n",
         "\n",
-        "This notebook illustrates common RunInference patterns,such as:\n",
+        "This notebook illustrates common RunInference patterns, such as:\n",
         "*   Using a database with RunInference.\n",
         "*   Postprocessing results after using RunInference.\n",
         "*   Inference with multiple models in the same pipeline.\n",
@@ -71,7 +71,7 @@
       "source": [
         "## Dependencies\n",
         "\n",
-        "The RunInference library is available in Apache Beam versions 
<b>2.40</b> and later.\n",
+        "The RunInference library is available in Apache Beam versions 2.40 
and later.\n",
         "\n",
         "To use Pytorch RunInference API, you need to install the PyTorch 
module. To install PyTorch, use `pip`:"
       ]
@@ -235,7 +235,7 @@
       },
       "source": [
         "### Train the linear regression mode on 5 times data\n",
-        "Use the following to train your linear regression model on the 5 
times table."
+        "Use the following code to train your linear regression model on the 5 
times table."
       ]
     },
     {
@@ -270,7 +270,7 @@
         "id": "bd106b29-6187-42c1-9743-1666c147b5e3"
       },
       "source": [
-        "Save the model using `torch.save()` and then confirm that the saved 
model file exists."
+        "Save the model using `torch.save()`, and then confirm that the saved 
model file exists."
       ]
     },
     {
@@ -304,6 +304,7 @@
       },
       "source": [
         "### Prepare train and test data for a 10 times model\n",
+        "This example model is a 10 times table.\n",
         "* `x` contains values in the range from 0 to 99.\n",
         "* `y` is a list of 10 * `x`. "
       ]
@@ -404,7 +405,7 @@
       "source": [
         "### Use RunInference within the pipeline\n",
         "\n",
-        "1. Create a PyTorch model handler object by passing required 
arguments such as `state_dict_path`, `model_class`, `model_params` to the 
`PytorchModelHandlerTensor` class.\n",
+        "1. Create a PyTorch model handler object by passing required 
arguments such as `state_dict_path`, `model_class`, and `model_params` to the 
`PytorchModelHandlerTensor` class.\n",
         "2. Pass the `PytorchModelHandlerTensor` object to the RunInference 
transform to perform predictions on unkeyed data."
       ]
     },
@@ -455,8 +456,8 @@
         "id": "9d95e69b-203f-4abb-9abb-360bdf4d769a"
       },
       "source": [
-        "## Pattern 2: Post-process RunInference results.\n",
-        "This pattern demonstrates how to post-process the RunInference 
results.\n",
+        "## Pattern 2: Postprocess RunInference results\n",
+        "This pattern demonstrates how to postprocess the RunInference 
results.\n",
         "\n",
         "Add a `PredictionProcessor` to the pipeline after `RunInference`. 
`PredictionProcessor` processes the output of the `RunInference` transform."
       ]
@@ -529,11 +530,11 @@
         "\n",
         "Modify the pipeline to read from sources like CSV files and 
BigQuery.\n",
         "\n",
-        "In this step we do the following:\n",
+        "In this step, you take the following actions:\n",
         "\n",
         "* To handle keyed data, wrap the `PytorchModelHandlerTensor` object 
around `KeyedModelHandler`.\n",
         "* Add a map transform that converts a table row into `Tuple[str, 
float]`.\n",
-        "* Add a map transform that converts `Tuple[str, float]` from  to 
`Tuple[str, torch.Tensor]`.\n",
+        "* Add a map transform that converts `Tuple[str, float]` to 
`Tuple[str, torch.Tensor]`.\n",
         "* Modify the post-inference processor to output results with the key."
       ]
     },
@@ -564,7 +565,8 @@
         "id": "f22da313-5bf8-4334-865b-bbfafc374e63"
       },
       "source": [
-        "### Create a source with attached key\n"
+        "### Create a source with attached key\n",
+        "This section shows how to create either a BigQuery or a CSV source 
with an attached key."
       ]
     },
     {
@@ -573,7 +575,8 @@
         "id": "c9b0fb49-d605-4f26-931a-57f42b0ad253"
       },
       "source": [
-        "#### Use BigQuery as the source"
+        "#### Use BigQuery as the source",
+        "Follow these steps to use BigQuery as your source."
       ]
     },
     {
@@ -741,7 +744,8 @@
         "id": "53ee7f24-5625-475a-b8cc-9c031591f304"
       },
       "source": [
-        "#### Use a CSV file as the source"
+        "#### Use a CSV file as the source",
+        "Follow these steps to use a CSV file as your source."
       ]
     },
     {
@@ -826,7 +830,7 @@
         "## Pattern 4: Inference with multiple models in the same pipeline\n",
         "This pattern demonstrates how use inference with multiple models in 
the same pipeline.\n",
         "\n",
-        "### Inference with multiple models in parallel\n",
+        "### Multiple models in parallel\n",
         "This section demonstrates how use inference with multiple models in 
parallel."
       ]
     },
@@ -926,7 +930,7 @@
         "id": "e71e6706-5d8d-4322-9def-ac7fb20d4a50"
       },
       "source": [
-        "### Inference with multiple models in sequence\n",
+        "### Multiple models in sequence\n",
         "This section demonstrates how use inference with multiple models in 
sequence.\n",
         "\n",
         "In a sequential pattern, data is sent to one or more models in 
sequence, \n",
diff --git 
a/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb 
b/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb
index 3dac52f9d7a..60f79d63a5b 100644
--- a/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb
+++ b/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb
@@ -17,11 +17,6 @@
   "cells": [
     {
       "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "LzOTNrs_P6Vv"
-      },
-      "outputs": [],
       "source": [
         "# @title ###### Licensed to the Apache Software Foundation (ASF), 
Version 2.0 (the \"License\")\n",
         "\n",
@@ -41,16 +36,13 @@
         "# KIND, either express or implied. See the License for the\n",
         "# specific language governing permissions and limitations\n",
         "# under the License"
-      ]
-    },
-    {
-      "cell_type": "markdown",
+      ],
       "metadata": {
+        "cellView": "form",
         "id": "faayYQYrQzY3"
-      },
-      "source": [
-        "## Use RunInference in Apache Beam"
-      ]
+            },
+      "execution_count": null,
+      "outputs": []
     },
     {
       "cell_type": "markdown",
@@ -58,8 +50,9 @@
         "id": "JjAt1GesQ9sg"
       },
       "source": [
-        "Starting with Apache Beam 2.40.0, you can use Apache Beam with the 
RunInference API to use machine learning (ML) models for local and remote 
inference with batch and streaming pipelines.\n",
-        "The RunInference API leverages Apache Beam concepts, such as the 
BatchElements transform and the Shared class, to support models in your 
pipelines that create transforms optimized for machine learning inferences.\n",
+        "# Use RunInference in Apache Beam\n",
+        "You can use Apache Beam versions 2.40.0 and later with the 
[RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 for local and remote inference with batch and streaming pipelines.\n",
+        "The RunInference API leverages Apache Beam concepts, such as the 
`BatchElements` transform and the `Shared` class, to support models in your 
pipelines that create transforms optimized for machine learning inference.\n",
         "\n",
         "For more information about the RunInference API, see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) 
in the Apache Beam documentation."
       ]
@@ -70,13 +63,13 @@
         "id": "A8xNRyZMW1yK"
       },
       "source": [
-        "This notebook demonstrates how to use the RunInference API with three 
popular ML frameworks: PyTorch, TensorFlow, and scikit-learn. The three 
pipelines use a text classification model for generating predictions.\n",
+        "This example demonstrates how to use the RunInference API with three 
popular ML frameworks: PyTorch, TensorFlow, and scikit-learn. The three 
pipelines use a text classification model for generating predictions.\n",
         "\n",
         "Follow these steps to build a pipeline:\n",
         "* Read the images.\n",
         "* If needed, preprocess the text.\n",
-        "* Inference with the PyTorch, TensorFlow, or Scikit-learn model.\n",
-        "* If needed, postprocess the output from RunInference."
+        "* Run inference with the PyTorch, TensorFlow, or Scikit-learn 
model.\n",
+        "* If needed, postprocess the output."
       ]
     },
     {
@@ -126,9 +119,9 @@
         "id": "ObRPUrlEbjHj"
       },
       "source": [
-        "### Model\n",
+        "### Install the model\n",
         "\n",
-        "This example uses a pretrained text classification model, 
[distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english?text=I+like+you.+I+love+you).
 This model is a checkpoint of DistilBERT-base-uncased, fine-tuned on the SST-2 
dataset.\n"
+        "This example uses a pretrained text classification model, 
[distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english?text=I+like+you.+I+love+you).
 This model is a checkpoint of `DistilBERT-base-uncased`, fine-tuned on the 
SST-2 dataset.\n"
       ]
     },
     {
@@ -165,7 +158,7 @@
         "id": "vA1UmbFRb5C-"
       },
       "source": [
-        "### Helper functions\n",
+        "### Install helper functions\n",
         "\n",
         "The model also uses helper functions."
       ]
@@ -231,9 +224,9 @@
         "id": "WYYbQTMWctkW"
       },
       "source": [
-        "### RunInference pipeline\n",
+        "### Run the pipeline\n",
         "\n",
-        "This section demonstrates how to use create and run the RunInference 
pipeline."
+        "This section demonstrates how to create and run the RunInference 
pipeline."
       ]
     },
     {
@@ -797,7 +790,7 @@
         "id": "h2JP7zsqerCT"
       },
       "source": [
-        "### Model"
+        "### Install the model"
       ]
     },
     {
@@ -827,7 +820,7 @@
         "id": "GZ-Ioc8ZfyIT"
       },
       "source": [
-        "### Helper functions\n",
+        "### Install helper functions\n",
         "\n",
         "The model also uses helper functions."
       ]
@@ -874,7 +867,7 @@
         "id": "PZVwI4BbgaAI"
       },
       "source": [
-        "### Prepare the Input\n",
+        "### Prepare the input\n",
         "\n",
         "This section demonstrates how to prepare the input for your model."
       ]
@@ -921,9 +914,9 @@
         "id": "BYkQl_l8gRgo"
       },
       "source": [
-        "### RunInference Pipeline\n",
+        "### Run the pipeline\n",
         "\n",
-        "This section demonstrates how to use create and run the RunInference 
pipeline."
+        "This section demonstrates how to create and run the RunInference 
pipeline."
       ]
     },
     {
@@ -991,7 +984,7 @@
         "id": "6ArL_55kjxkO"
       },
       "source": [
-        "### Install Dependencies\n",
+        "### Install dependencies\n",
         "\n",
         "First, download and install the dependencies."
       ]
@@ -1030,7 +1023,7 @@
         "id": "-7ABKlZvkFHy"
       },
       "source": [
-        "### Model\n",
+        "### Install the model\n",
         "\n",
         "To classify movie reviews as either positive or negative, train and 
save a sentiment analysis pipeline about movie reviews."
       ]
@@ -1059,9 +1052,9 @@
         "id": "KL4Cx8s0mBqn"
       },
       "source": [
-        "### RunInference Pipeline\n",
+        "### Run the pipeline\n",
         "\n",
-        "This section demonstrates how to use create and run the RunInference 
pipeline."
+        "This section demonstrates how to create and run the RunInference 
pipeline."
       ]
     },
     {
diff --git a/examples/notebooks/beam-ml/run_inference_sklearn.ipynb 
b/examples/notebooks/beam-ml/run_inference_sklearn.ipynb
index 9afcccc30f6..c9e151750a3 100644
--- a/examples/notebooks/beam-ml/run_inference_sklearn.ipynb
+++ b/examples/notebooks/beam-ml/run_inference_sklearn.ipynb
@@ -51,21 +51,21 @@
       },
       "source": [
         "# Apache Beam RunInference for scikit-learn\n",
-        "This notebook demonstrates the use of the RunInference transform for 
[scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
+        "This notebook demonstrates the use of the RunInference transform for 
[scikit-learn](https://scikit-learn.org/), also called sklearn.\n",
         "Apache Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 has implementations of the 
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 class prebuilt for scikit-learn. For more information about the RunInference 
API, see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine [...]
         "\n",
-        "Users can choose a model handler for their input data type:\n",
-        "* The [numpy model 
handler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.sklearn_inference.html#apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerNumpy)\n",
-        "* The [pandas dataframes model 
handler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.sklearn_inference.html#apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerNumpy)\n",
+        "You can choose the appropriate model handler based on your input data 
type:\n",
+        "* [NumPy model 
handler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.sklearn_inference.html#apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerNumpy)\n",
+        "* [Pandas DataFrame model 
handler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.sklearn_inference.html#apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerNumpy)\n",
         "\n",
-        "With RunInference, these ModelHandlers manage batching, 
vectorization, and prediction optimization for your scikit-learn pipeline or 
model.\n",
+        "With RunInference, these model handlers manage batching, 
vectorization, and prediction optimization for your scikit-learn pipeline or 
model.\n",
         "\n",
         "This notebook demonstrates the following common RunInference 
patterns:\n",
         "*   Generate predictions.\n",
         "*   Postprocess results after RunInference.\n",
-        "*   Inference with multiple models in the same pipeline.\n",
+        "*   Run inference with multiple models in the same pipeline.\n",
         "\n",
-        "The linear regression models used in these samples are trained on 
data that correspondes to the 5 and 10 times table; that is,`y = 5x` and `y = 
10x` respectively."
+        "The linear regression models used in these samples are trained on 
data that correspondes to the 5 and 10 times tables; that is,`y = 5x` and `y = 
10x` respectively."
       ]
     },
     {
@@ -75,7 +75,7 @@
         "Complete the following setup steps:\n",
         "1. Install dependencies for Apache Beam.\n",
         "1. Authenticate with Google Cloud.\n",
-        "1. Specify your project and bucket. You need the project and bucket 
to save and load models."
+        "1. Specify your project and bucket. You use the project and bucket to 
save and load models."
       ],
       "metadata": {
         "id": "zzwnMzzgdyPB"
@@ -176,7 +176,7 @@
         "2. Train the linear regression model.\n",
         "3. Save the scikit-learn model using `pickle`.\n",
         "\n",
-        "In this example, we create two models, one with the 5 times model and 
a section with the 10 times model."
+        "In this example, you create two models, one with the 5 times model 
and a second with the 10 times model."
       ]
     },
     {
@@ -214,9 +214,9 @@
         "id": "69008a3d-3d15-4643-828c-b0419b347d01"
       },
       "source": [
-        "### scikit-learn RunInference pipeline\n",
-        "This section demonstrates the following steps:\n",
-        "1. Define the scikit-learn model handler that accepts an `array_like` 
object as input.\n",
+        "### Create a scikit-learn RunInference pipeline\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Define a scikit-learn model handler that accepts an `array_like` 
object as input.\n",
         "2. Read the data from BigQuery.\n",
         "3. Use the scikit-learn trained model and the scikit-learn 
RunInference transform on unkeyed data."
       ]
@@ -360,8 +360,8 @@
         "id": "33e901d6-ed06-4268-8a5f-685d31b5558f"
       },
       "source": [
-        "### Sklearn RunInference on keyed inputs.\n",
-        "This section demonstrates the following steps:\n",
+        "### Use sklearn RunInference on keyed inputs\n",
+        "This section demonstrates how to do the following:\n",
         "1. Wrap the `SklearnModelHandlerNumpy` object around 
`KeyedModelHandler` to handle keyed data.\n",
         "2. Read the data from BigQuery.\n",
         "3. Use the sklearn trained model and the sklearn RunInference 
transform on a keyed data."
@@ -410,7 +410,7 @@
       "source": [
         "## Run multiple models\n",
         "\n",
-        "This pipeline takes two RunInference transforms with different models 
and then combines the output."
+        "This code creates a pipeline that takes two RunInference transforms 
with different models and then combines the output."
       ],
       "metadata": {
         "id": "JQ4zvlwsRK1W"
diff --git a/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb 
b/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb
index 3e2e9e428ae..81e3bd38cac 100644
--- a/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb
+++ b/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb
@@ -7,8 +7,8 @@
       "collapsed_sections": []
     },
     "kernelspec": {
-      "display_name": "Python 3",
-      "name": "python3"
+      "name": "python3",
+      "display_name": "Python 3"
     },
     "language_info": {
       "name": "python"
@@ -39,6 +39,7 @@
         "# under the License"
       ],
       "metadata": {
+        "cellView": "form",
         "id": "fFjof1NgAJwu"
       },
       "execution_count": null,
@@ -49,11 +50,11 @@
       "source": [
         "# Apache Beam RunInference with TensorFlow\n",
         "This notebook demonstrates the use of the RunInference transform for 
[TensorFlow](https://www.tensorflow.org/).\n",
-        "Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 accepts a ModelHandler generated from 
[`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",
+        "Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 accepts a ModelHandler generated from 
[`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) using 
`CreateModelHandler`.\n",
         "\n",
-        "The Apache Beam RunInference transform is used for making predictions 
for\n",
+        "The Apache Beam RunInference transform is used to make predictions 
for\n",
         "a variety of machine learning models. In versions 1.10.0 and later of 
`tfx-bsl`, you can\n",
-        "create a TensorFlow ModelHandler for use with Apache Beam. For more 
information about the RunInference API, see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) 
in the Apache Beam documentation.\n",
+        "create a TensorFlow `ModelHandler` for use with Apache Beam. For more 
information about the RunInference API, see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) 
in the Apache Beam documentation.\n",
         "\n",
         "This notebook demonstrates the following steps:\n",
         "- Import [`tfx-bsl`](https://github.com/tensorflow/tfx-bsl).\n",
@@ -68,6 +69,9 @@
     {
       "cell_type": "markdown",
       "source": [
+        "## Before you begin\n",
+        "Complete the following setup steps.\n",
+        "\n",
         "First, import `tfx-bsl`."
       ],
       "metadata": {
@@ -123,7 +127,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Authenticate with Google Cloud\n",
+        "### Authenticate with Google Cloud\n",
         "This notebook relies on saving your model to Google Cloud. To use 
your Google Cloud account, authenticate this notebook."
       ],
       "metadata": {
@@ -145,7 +149,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Import dependencies and set up your bucket\n",
+        "### Import dependencies and set up your bucket\n",
         "Replace `PROJECT_ID` and `BUCKET_NAME` with the ID of your project 
and the name of your bucket.\n",
         "\n",
         "**Important**: If an error occurs, restart your runtime."
@@ -193,12 +197,20 @@
       "source": [
         "## Create and test a simple model\n",
         "\n",
-        "This step creates a model that predicts the 5 times table."
+        "This step creates and tests a model that predicts the 5 times table."
       ],
       "metadata": {
         "id": "YzvZWEv-1oiK"
       }
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Create the model\n",
+        "Create training data and build a linear regression model."
+      ]
+    },
     {
       "cell_type": "code",
       "metadata": {
@@ -296,7 +308,7 @@
       "source": [
         "### Populate the data in a TensorFlow proto\n",
         "\n",
-        "Tensorflow data uses protos. If you are loading from a file, helpers 
exist for this step. Because we are using generated data, this code populates a 
proto."
+        "Tensorflow data uses protos. If you are loading from a file, helpers 
exist for this step. Because this example uses generated data, this code 
populates a proto."
       ],
       "metadata": {
         "id": "dEmleqiH3t71"
@@ -356,7 +368,7 @@
       "source": [
         "### Fit The Model\n",
         "\n",
-        "This example builds a model. Because RunInference requires pretrained 
models, this segment builds a usable model."
+        "This step builds a model. Because RunInference requires pretrained 
models, this segment builds a usable model."
       ],
       "metadata": {
         "id": "G-sAu3cf31f3"
@@ -445,6 +457,7 @@
       "cell_type": "markdown",
       "source": [
         "## Run the Pipeline\n",
+        "Use the following code to run the pipeline.\n",
         "\n",
         "`FormatOutput` demonstrates how to extract values from the output 
protos.\n",
         "\n",
@@ -507,11 +520,10 @@
         "\n",
         "By default, the `ModelHandler` does not expect a key.\n",
         "\n",
-        "If you know that keys are associated with your examples, wrap the 
model handler with `beam.KeyedModelHandler`.\n",
-        "\n",
-        "If you don't know whether keys are associated with your examples, use 
`beam.MaybeKeyedModelHandler`.\n",
+        "* If you know that keys are associated with your examples, wrap the 
model handler with `beam.KeyedModelHandler`.\n",
+        "* If you don't know whether keys are associated with your examples, 
use `beam.MaybeKeyedModelHandler`.\n",
         "\n",
-        "This step also illustrates how to use `tfx-bsl` examples."
+        "In addition to demonstrating how to use a keyed model handler, this 
step demonstrates how to use `tfx-bsl` examples."
       ],
       "metadata": {
         "id": "IXikjkGdHm9n"
@@ -583,4 +595,4 @@
       ]
     }
   ]
-}
+}
\ No newline at end of file

[beam] branch master updated: ML notebook formatting and text updates (#24437)

Reply via email to