[GitHub] [beam] damccorm commented on a diff in pull request #24125: Editorial review of the ML notebooks.

GitBox Mon, 14 Nov 2022 13:22:15 -0800


damccorm commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022063494



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -57,12 +57,12 @@
     {
       "cell_type": "markdown",
       "source": [
-        "A single machine learning  model may not always be the perfect 
solution for a give task. Oftentimes, machine learning model tasks involve 
aggregating mutliple models together to produce one optimal predictive model 
and boost performance. \n",
+        "A single machine learning model might not be the right solution for 
your task. Often, machine learning model tasks involve aggregating mutliple 
models together to produce one optimal predictive model and to boost 
performance. \n",
         " \n",
         "\n",
-        "In this notebook, we will shows you an example on how to implement a 
cascade model in Beam using the [RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The 
RunInference API enables you to run your Beam transfroms as part of your 
pipeline for optimal machine learning inference in beam.     \n",
+        "This notebook shows how to implement a cascade model in Apache Beam 
using the [RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The 
RunInference API enables you to run your Beam transfroms as part of your 
pipeline for optimal machine learning inference.\n",

Review Comment:
   ```suggestion
           "This notebook shows how to implement a cascade model in Apache Beam 
using the [RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The 
RunInference API enables you to run your Beam transforms as part of your 
pipeline for optimal machine learning inference.\n",
   ```



##########
examples/notebooks/beam-ml/run_custom_inference.ipynb:
##########
@@ -37,21 +36,15 @@
         "id": "b6f8f3af-744e-4eaa-8a30-6d03e8e4d21e"
       },
       "source": [
-        "# Bring your own Machine Learning (ML) model to Beam RunInference\n",
+        "# Bring your own machine learning (ML) model to Beam RunInference\n",
         "\n",
-        "<button>\n",
-        "  <a 
href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\";>\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\"; 
alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through a simple example to show how to 
build your own ML model handler using\n",
+        "This notebook demonstrates how to build your own ML model handler 
using\n",
         
"[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).\n",

Review Comment:
   This sentence is kinda odd - maybe we could do something like "This notebook 
demonstrates how to run inference on your custom framework using the 
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 class"



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -190,9 +189,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Create and Test a Simple Model\n",
+        "## Create and test a simple model\n",
         "\n",
-        "This creates a model that predicts the 5 times table."
+        "This steps creates a model that predicts the 5 times table."

Review Comment:
   ```suggestion
           "This step creates a model that predicts the 5 times table."
   ```



##########
examples/notebooks/beam-ml/run_inference_sklearn.ipynb:
##########
@@ -52,26 +51,18 @@
       },
       "source": [
         "# Apache Beam RunInference for scikit-learn\n",
-        "\n",
-        "<button>\n",
-        "  <a 
href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\";>\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\"; 
alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through the use of the RunInference 
transform for [scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
-        "Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 has implementations of 
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 prebuilt for scikit-learn.\n",
+        "This notebook demonstrates the use of the RunInference transform for 
[scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
+        "Apache Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 has implementations of 
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 prebuilt for scikit-learn. For more information about the RunInference API, 
see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) 
in the Apache Beam documentation.\n",

Review Comment:
   ```suggestion
           "Apache Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 has implementations of the 
[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 class prebuilt for scikit-learn. For more information about the RunInference 
API, see [Machine 
Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) 
in the Apache Beam documentation.\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -248,7 +247,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### Test the Model\n"
+        "### Test the model\n",
+        "\n",
+        "This steps tests the model that you created."

Review Comment:
   ```suggestion
           "This step tests the model that you created."
   ```



##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -946,9 +977,9 @@
       "source": [
         "input_strings_file = 'input_strings.tfrecord'\n",
         "\n",
-        "# Preprocess the input as RunInference is expecting a serialized 
tf.example as an input\n",
-        "# Write the processed input to a file \n",
-        "# One can also do it as a pipeline step by using beam.Map() \n",
+        "# Because RunInferene is expecting a serialized tf.example as an 
input, preprocess the input.\n",

Review Comment:
   ```suggestion
           "# Because RunInference is expecting a serialized tf.example as an 
input, preprocess the input.\n",
   ```



##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -34,13 +34,17 @@
         "id": "0UGzzndTBPWQ"
       },
       "source": [
-        "# Remote inference in Beam\n",
+        "# Remote inference in Apache Beam\n",
         "\n",
-        "The prefered way of running inference in Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The 
RunInference API enables you to run your models as part of your pipeline in a 
way that is optimized for machine learning inference. It supports features such 
as batching, so that you do not need to take care of it yourself. For more info 
on the RunInference API you can check out the [RunInference 
notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb),
 which demonstrates how you can implement model inference in pytorch, 
scikit-learn and tensorflow.\n",
+        "The prefered way to run inference in Apache Beam is by using the 
[RunInference 
API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
+        "The RunInference API enables you to run your models as part of your 
pipeline in a way that is optimized for machine learning inference. \n",
+        "To reduce the number of steps that you need to take, RunInference 
supports features like batching. For more infomation about the RunInference 
API, review the [RunInference 
notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb),
 \n",
+        "which demonstrates how to implement model inference in PyTorch, 
scikit-learn, and TensorFlow.\n",
         "\n",
-        "As of now, RunInference API doesn't support making remote inference 
calls (e.g. Natural Language API, Cloud Vision API and others). Therefore, in 
order to use these remote APIs with Beam, one needs to write custom inference 
call. \n",
+        "Currently, the RunInference API doesn't support making remote 
inference calls using the Natural Language API, Cloud Vision API, and so on. 
\n",
+        "Therefore, to use these remote APIs with Apache Beam, you need to 
write custom inference calls.\n",
         "\n",
-        "This notebook shows how you can implement such a custom inference 
call in Beam. We are using Cloud Vision API for demonstration. \n"
+        "This notebook shows how to implement a custom inference call in 
Apache Beam. This example uses Cloud Vision API."

Review Comment:
   ```suggestion
           "This notebook shows how to implement a custom inference call in 
Apache Beam. This example uses the Google Cloud Vision API."
   ```
   
   I actually think not having `the` is valid based on 
https://cloud.google.com/vision, but we prefix with `the` elsewhere and it 
feels more natural to me. Consistency is the bigger thing though IMO.
   
   Same thing with `Google` - we use this elsewhere and introducing it here 
probably is less confusing



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -164,7 +164,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "The RunInference library is available in Apache Beam version **2.40** 
or later. "
+        "This section shows how to demonstrate the dependencies for this 
example.\n",

Review Comment:
   ```suggestion
           "This section shows how to install the dependencies for this 
example.\n",
   ```



##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -279,15 +291,19 @@
       "source": [
         "### Batching\n",
         "\n",
-        "Before we can chain all the different steps together in a pipeline, 
there is one more thing we need to understand: batching. When running inference 
with your model (both in Beam itself or in an external API), you can batch your 
input together to allow for more efficient execution of your model. When using 
a custom DoFn, you need to take care of the batching yourself, in contrast with 
the RunInference API which takes care of this for you.\n",
+        "Before we can chain together the pipeline steps, we need to 
understand batching.\n",
+        "When running inference with your model, either in Apache Beam or in 
an external API, you can batch your input to increase the efficiency of the 
model execution.\n",
+        "When using a custom DoFn, as in this example, you need to manage the 
batching.\n",
         "\n",
-        "In order to achieve this in our pipeline: we will introduce one more 
step in our pipeline, a `BatchElements` transform that will group elements 
together to form a batch of the desired size.\n",
+        "To manage the batching in this pipeline, include a `BatchElements` 
transform to group elements together and form a batch of the desired size.\n",
         "\n",
-        "⚠️ If you have a streaming pipeline, you may considering using 
[GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/)
 as `BatchElements` doesn't batch things across bundles. `GroupIntoBatches` 
requires choosing a key within which things are batched.\n",
+        "**Caution:**\n",

Review Comment:
   I don't think this really merits a caution - it is an aside/alternate 
approach, but nothing bad will happen if you use BatchElements (it will just 
produce small batch sizes)



##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -1001,7 +1034,7 @@
         "saved_model_spec = 
model_spec_pb2.SavedModelSpec(model_path=model_dir)\n",
         "inference_spec_type = 
model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec)\n",
         "\n",
-        "#A Beam IO that reads a file of serialized tf.Examples\n",
+        "#A Beam I/O that reads a file of serialized tf.Examples\n",

Review Comment:
   ```suggestion
           "# A Beam I/O that reads a file of serialized tf.Examples\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -103,17 +103,17 @@
     {
       "cell_type": "markdown",
       "source": [
-        "The steps needed to build this pipeline can be summarized as 
follows:\n",
+        "The steps to build this pipeline are as follows:\n",
         "* Read the images.\n",
         "* Preprocess the images for caption generation for inference with the 
BLIP model.\n",
-        "* Inference with BLIP to generate a list of caption candidates . \n",
+        "* Inference with BLIP to generate a list of caption candidates.\n",
         "* Aggregate the generated captions with their source image.\n",
         "* Preprocess the aggregated image-caption pair to rank them with 
CLIP.\n",
-        "* Inference wih CLIP to generated the caption ranking. \n",
-        "* Print the image names and the captions sorted according to their 
ranking\n",
+        "* Inference with CLIP to generated the caption ranking. \n",

Review Comment:
   ```suggestion
           "* Inference with CLIP to generate the caption ranking. \n",
   ```



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -46,33 +49,29 @@
       "cell_type": "markdown",
       "source": [
         "# Apache Beam RunInference with TensorFlow\n",
-        "\n",
-        "<button>\n",
-        "  <a 
href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\";>\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\"; 
alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
+        "This notebook demonstrates the use of the RunInference transform for 
[TensorFlow](https://www.tensorflow.org/).\n",
+        "Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 accepts ModelHandler generated from 
[`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",

Review Comment:
   ```suggestion
           "Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 accepts a ModelHandler generated from 
[`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Reply via email to