damccorm commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022063494
##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -57,12 +57,12 @@
{
"cell_type": "markdown",
"source": [
- "A single machine learning model may not always be the perfect
solution for a give task. Oftentimes, machine learning model tasks involve
aggregating mutliple models together to produce one optimal predictive model
and boost performance. \n",
+ "A single machine learning model might not be the right solution for
your task. Often, machine learning model tasks involve aggregating mutliple
models together to produce one optimal predictive model and to boost
performance. \n",
" \n",
"\n",
- "In this notebook, we will shows you an example on how to implement a
cascade model in Beam using the [RunInference
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The
RunInference API enables you to run your Beam transfroms as part of your
pipeline for optimal machine learning inference in beam. \n",
+ "This notebook shows how to implement a cascade model in Apache Beam
using the [RunInference
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The
RunInference API enables you to run your Beam transfroms as part of your
pipeline for optimal machine learning inference.\n",
Review Comment:
```suggestion
"This notebook shows how to implement a cascade model in Apache Beam
using the [RunInference
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The
RunInference API enables you to run your Beam transforms as part of your
pipeline for optimal machine learning inference.\n",
```
##########
examples/notebooks/beam-ml/run_custom_inference.ipynb:
##########
@@ -37,21 +36,15 @@
"id": "b6f8f3af-744e-4eaa-8a30-6d03e8e4d21e"
},
"source": [
- "# Bring your own Machine Learning (ML) model to Beam RunInference\n",
+ "# Bring your own machine learning (ML) model to Beam RunInference\n",
"\n",
- "<button>\n",
- " <a
href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
- " <img src=\"https://beam.apache.org/images/favicon.ico\"
alt=\"Open the docs\" height=\"16\"/>\n",
- " Beam RunInference\n",
- " </a>\n",
- "</button>\n",
- "\n",
- "In this notebook, we walk through a simple example to show how to
build your own ML model handler using\n",
+ "This notebook demonstrates how to build your own ML model handler
using\n",
"[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).\n",
Review Comment:
This sentence is kinda odd - maybe we could do something like "This notebook
demonstrates how to run inference on your custom framework using the
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
class"
##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -190,9 +189,9 @@
{
"cell_type": "markdown",
"source": [
- "## Create and Test a Simple Model\n",
+ "## Create and test a simple model\n",
"\n",
- "This creates a model that predicts the 5 times table."
+ "This steps creates a model that predicts the 5 times table."
Review Comment:
```suggestion
"This step creates a model that predicts the 5 times table."
```
##########
examples/notebooks/beam-ml/run_inference_sklearn.ipynb:
##########
@@ -52,26 +51,18 @@
},
"source": [
"# Apache Beam RunInference for scikit-learn\n",
- "\n",
- "<button>\n",
- " <a
href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
- " <img src=\"https://beam.apache.org/images/favicon.ico\"
alt=\"Open the docs\" height=\"16\"/>\n",
- " Beam RunInference\n",
- " </a>\n",
- "</button>\n",
- "\n",
- "In this notebook, we walk through the use of the RunInference
transform for [scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
- "Beam
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
has implementations of
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
prebuilt for scikit-learn.\n",
+ "This notebook demonstrates the use of the RunInference transform for
[scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
+ "Apache Beam
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
has implementations of
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
prebuilt for scikit-learn. For more information about the RunInference API,
see [Machine
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning)
in the Apache Beam documentation.\n",
Review Comment:
```suggestion
"Apache Beam
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
has implementations of the
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
class prebuilt for scikit-learn. For more information about the RunInference
API, see [Machine
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning)
in the Apache Beam documentation.\n",
```
##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -248,7 +247,9 @@
{
"cell_type": "markdown",
"source": [
- "### Test the Model\n"
+ "### Test the model\n",
+ "\n",
+ "This steps tests the model that you created."
Review Comment:
```suggestion
"This step tests the model that you created."
```
##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -946,9 +977,9 @@
"source": [
"input_strings_file = 'input_strings.tfrecord'\n",
"\n",
- "# Preprocess the input as RunInference is expecting a serialized
tf.example as an input\n",
- "# Write the processed input to a file \n",
- "# One can also do it as a pipeline step by using beam.Map() \n",
+ "# Because RunInferene is expecting a serialized tf.example as an
input, preprocess the input.\n",
Review Comment:
```suggestion
"# Because RunInference is expecting a serialized tf.example as an
input, preprocess the input.\n",
```
##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -34,13 +34,17 @@
"id": "0UGzzndTBPWQ"
},
"source": [
- "# Remote inference in Beam\n",
+ "# Remote inference in Apache Beam\n",
"\n",
- "The prefered way of running inference in Beam is by using the
[RunInference
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The
RunInference API enables you to run your models as part of your pipeline in a
way that is optimized for machine learning inference. It supports features such
as batching, so that you do not need to take care of it yourself. For more info
on the RunInference API you can check out the [RunInference
notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb),
which demonstrates how you can implement model inference in pytorch,
scikit-learn and tensorflow.\n",
+ "The prefered way to run inference in Apache Beam is by using the
[RunInference
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
+ "The RunInference API enables you to run your models as part of your
pipeline in a way that is optimized for machine learning inference. \n",
+ "To reduce the number of steps that you need to take, RunInference
supports features like batching. For more infomation about the RunInference
API, review the [RunInference
notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb),
\n",
+ "which demonstrates how to implement model inference in PyTorch,
scikit-learn, and TensorFlow.\n",
"\n",
- "As of now, RunInference API doesn't support making remote inference
calls (e.g. Natural Language API, Cloud Vision API and others). Therefore, in
order to use these remote APIs with Beam, one needs to write custom inference
call. \n",
+ "Currently, the RunInference API doesn't support making remote
inference calls using the Natural Language API, Cloud Vision API, and so on.
\n",
+ "Therefore, to use these remote APIs with Apache Beam, you need to
write custom inference calls.\n",
"\n",
- "This notebook shows how you can implement such a custom inference
call in Beam. We are using Cloud Vision API for demonstration. \n"
+ "This notebook shows how to implement a custom inference call in
Apache Beam. This example uses Cloud Vision API."
Review Comment:
```suggestion
"This notebook shows how to implement a custom inference call in
Apache Beam. This example uses the Google Cloud Vision API."
```
I actually think not having `the` is valid based on
https://cloud.google.com/vision, but we prefix with `the` elsewhere and it
feels more natural to me. Consistency is the bigger thing though IMO.
Same thing with `Google` - we use this elsewhere and introducing it here
probably is less confusing
##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -164,7 +164,9 @@
{
"cell_type": "markdown",
"source": [
- "The RunInference library is available in Apache Beam version **2.40**
or later. "
+ "This section shows how to demonstrate the dependencies for this
example.\n",
Review Comment:
```suggestion
"This section shows how to install the dependencies for this
example.\n",
```
##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -279,15 +291,19 @@
"source": [
"### Batching\n",
"\n",
- "Before we can chain all the different steps together in a pipeline,
there is one more thing we need to understand: batching. When running inference
with your model (both in Beam itself or in an external API), you can batch your
input together to allow for more efficient execution of your model. When using
a custom DoFn, you need to take care of the batching yourself, in contrast with
the RunInference API which takes care of this for you.\n",
+ "Before we can chain together the pipeline steps, we need to
understand batching.\n",
+ "When running inference with your model, either in Apache Beam or in
an external API, you can batch your input to increase the efficiency of the
model execution.\n",
+ "When using a custom DoFn, as in this example, you need to manage the
batching.\n",
"\n",
- "In order to achieve this in our pipeline: we will introduce one more
step in our pipeline, a `BatchElements` transform that will group elements
together to form a batch of the desired size.\n",
+ "To manage the batching in this pipeline, include a `BatchElements`
transform to group elements together and form a batch of the desired size.\n",
"\n",
- "⚠️ If you have a streaming pipeline, you may considering using
[GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/)
as `BatchElements` doesn't batch things across bundles. `GroupIntoBatches`
requires choosing a key within which things are batched.\n",
+ "**Caution:**\n",
Review Comment:
I don't think this really merits a caution - it is an aside/alternate
approach, but nothing bad will happen if you use BatchElements (it will just
produce small batch sizes)
##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -1001,7 +1034,7 @@
"saved_model_spec =
model_spec_pb2.SavedModelSpec(model_path=model_dir)\n",
"inference_spec_type =
model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec)\n",
"\n",
- "#A Beam IO that reads a file of serialized tf.Examples\n",
+ "#A Beam I/O that reads a file of serialized tf.Examples\n",
Review Comment:
```suggestion
"# A Beam I/O that reads a file of serialized tf.Examples\n",
```
##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -103,17 +103,17 @@
{
"cell_type": "markdown",
"source": [
- "The steps needed to build this pipeline can be summarized as
follows:\n",
+ "The steps to build this pipeline are as follows:\n",
"* Read the images.\n",
"* Preprocess the images for caption generation for inference with the
BLIP model.\n",
- "* Inference with BLIP to generate a list of caption candidates . \n",
+ "* Inference with BLIP to generate a list of caption candidates.\n",
"* Aggregate the generated captions with their source image.\n",
"* Preprocess the aggregated image-caption pair to rank them with
CLIP.\n",
- "* Inference wih CLIP to generated the caption ranking. \n",
- "* Print the image names and the captions sorted according to their
ranking\n",
+ "* Inference with CLIP to generated the caption ranking. \n",
Review Comment:
```suggestion
"* Inference with CLIP to generate the caption ranking. \n",
```
##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -46,33 +49,29 @@
"cell_type": "markdown",
"source": [
"# Apache Beam RunInference with TensorFlow\n",
- "\n",
- "<button>\n",
- " <a
href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
- " <img src=\"https://beam.apache.org/images/favicon.ico\"
alt=\"Open the docs\" height=\"16\"/>\n",
- " Beam RunInference\n",
- " </a>\n",
- "</button>\n",
+ "This notebook demonstrates the use of the RunInference transform for
[TensorFlow](https://www.tensorflow.org/).\n",
+ "Beam
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
accepts ModelHandler generated from
[`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",
Review Comment:
```suggestion
"Beam
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
accepts a ModelHandler generated from
[`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]