[GitHub] [beam] damccorm commented on a diff in pull request #25947: Add documentation for the auto model updates

via GitHub Fri, 24 Mar 2023 14:02:02 -0700


damccorm commented on code in PR #25947:
URL: https://github.com/apache/beam/pull/25947#discussion_r1148040679



##########
website/www/site/content/en/documentation/ml/side-input-updates.md:
##########
@@ -0,0 +1,138 @@
+---
+title: "Auto Model Updates in RunInference Transforms using SideInputs"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference 
Transform
+
+The pipeline in this example uses 
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
 PTransform with a `side input` PCollection that emits `ModelMetadata` to run 
inferences on images using open source Tensorflow models trained on `imagenet`.

Review Comment:
   I'd add a couple sentences along the lines of "Side inputs can be used to do 
live updates of your model while your pipeline is still running. You can either 
use one of Beam's built in side inputs or configure a custom one to define how 
and when you'd like to update."



##########
website/www/site/content/en/documentation/ml/side-input-updates.md:
##########
@@ -0,0 +1,138 @@
+---
+title: "Auto Model Updates in RunInference Transforms using SideInputs"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference 
Transform
+
+The pipeline in this example uses 
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
 PTransform with a `side input` PCollection that emits `ModelMetadata` to run 
inferences on images using open source Tensorflow models trained on `imagenet`.
+
+In this example, we will use `WatchFilePattern` as a side input. 
`WatchFilePattern` is used to watch for the file updates matching the 
`file_pattern`
+based on timestamps and emits the latest 
[ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/),
 which is used in
+`RunInference` PTransform for the dynamic auto model updates without the need 
for stopping the beam pipeline.
+
+**Note**: Slowly-updating side input pattern is non-deterministic.
+
+### Setting up source
+
+We will use PubSub topic as a source to read the image names. 
+ * PubSub topic emits a `UTF-8` encoded model path that will be used read and 
preprocess images for running the inference.
+
+### Models for image segmentation
+
+For the purpose of this example, use models saved in 
[HDF5](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format) 
format. Initially, pass a model to the Tensorflow ModelHandler for predictions 
until there is an update via side input. 
+After a while, upload a model that matches the `file_pattern` to the GCS 
bucket. The bucket path will be used a glob pattern and is passed to the 
`WatchFilePattern`.

Review Comment:
   I'd phrase these as "First we will do X. Then we will do Y which will have Z 
effect." This currently reads as if you're instructing the user to do this now, 
but you have more detailed instructions below.



##########
website/www/site/content/en/documentation/ml/side-input-updates.md:
##########
@@ -0,0 +1,138 @@
+---
+title: "Auto Model Updates in RunInference Transforms using SideInputs"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference 
Transform
+
+The pipeline in this example uses 
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
 PTransform with a `side input` PCollection that emits `ModelMetadata` to run 
inferences on images using open source Tensorflow models trained on `imagenet`.
+
+In this example, we will use `WatchFilePattern` as a side input. 
`WatchFilePattern` is used to watch for the file updates matching the 
`file_pattern`
+based on timestamps and emits the latest 
[ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/),
 which is used in
+`RunInference` PTransform for the dynamic auto model updates without the need 
for stopping the beam pipeline.
+
+**Note**: Slowly-updating side input pattern is non-deterministic.
+
+### Setting up source
+
+We will use PubSub topic as a source to read the image names. 
+ * PubSub topic emits a `UTF-8` encoded model path that will be used read and 
preprocess images for running the inference.
+
+### Models for image segmentation
+
+For the purpose of this example, use models saved in 
[HDF5](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format) 
format. Initially, pass a model to the Tensorflow ModelHandler for predictions 
until there is an update via side input. 
+After a while, upload a model that matches the `file_pattern` to the GCS 
bucket. The bucket path will be used a glob pattern and is passed to the 
`WatchFilePattern`.
+Once there is an update, the RunInference PTransform will update the 
`model_uri` to use the latest model for inferences.
+
+### ModelHandler used for inference
+
+For the ModelHandler, we will be using 
[TFModelHandlerTensor](https://github.com/apache/beam/blob/186973b110d82838fb8e5ba27f0225a67c336591/sdks/python/apache_beam/ml/inference/tensorflow_inference.py#L184).
+```python
+from apache_beam.ml.inference.tensorflow_inference import TFModelHandlerTensor
+tf_model_handler = 
TFModelHandlerTensor(model_uri='gs://<your-bucket>/<model_path.h5>')
+``` 
+
+### Pre-processing image for inference
+The PubSub topic emits an image path. We need to read and preprocess the image 
to use it for RunInference. `read_image` function is used to read the image for 
inference.
+
+```python
+import io
+from PIL import Image
+from apache_beam.io.filesystems import FileSystems
+import numpy
+import tensorflow as tf
+
+def read_image(image_file_name):
+  with FileSystems().open(image_file_name, 'r') as file:
+    data = Image.open(io.BytesIO(file.read())).convert('RGB')  
+  img = data.resize((224, 224))
+  img = numpy.array(img) / 255.0
+  img_tensor = tf.cast(tf.convert_to_tensor(img[...]), dtype=tf.float32)
+  return img_tensor
+```
+
+Now, let's jump into the pipeline code.
+
+**Steps**:
+1. Get the image names from the PubSub topic.
+2. Read and pre-process the images using `read_image` function.
+3. Pass the images to the `RunInference` PTransform. RunInference takes 
`model_handler` and `model_metadata_pcoll`.
+   1. For the `model_handler`, `TFModelHandlerTensor` is used.
+   2. The `model_metadata_pcoll` is a [side 
input](https://beam.apache.org/documentation/programming-guide/#side-inputs) 
PCollection to the RunInference PTransform. This is used to update the models 
in the `model_handler` without needing to stop the beam pipeline. 
+      1. The `WatchFilePattern` is used as side input, which is used to watch 
a glob pattern matching `.h5` files. We use 
[HDF5](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format) 
standard to load the models.

Review Comment:
   I would rewrite these as sentences after the steps rather than substeps. So 
something like:
   
   ```
   1. Get the image names from the PubSub topic.
   2. Read and pre-process the images using `read_image` function.
   3. Pass the images to the `RunInference` PTransform. RunInference takes 
`model_handler` and `model_metadata_pcoll`.
   
   We will use `TFModelHandlerTensor` for our `model_handler`, ...
   ```



##########
website/www/site/content/en/documentation/ml/side-input-updates.md:
##########
@@ -0,0 +1,138 @@
+---
+title: "Auto Model Updates in RunInference Transforms using SideInputs"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference 
Transform
+
+The pipeline in this example uses 
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
 PTransform with a `side input` PCollection that emits `ModelMetadata` to run 
inferences on images using open source Tensorflow models trained on `imagenet`.
+
+In this example, we will use `WatchFilePattern` as a side input. 
`WatchFilePattern` is used to watch for the file updates matching the 
`file_pattern`
+based on timestamps and emits the latest 
[ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/),
 which is used in
+`RunInference` PTransform for the dynamic auto model updates without the need 
for stopping the beam pipeline.
+
+**Note**: Slowly-updating side input pattern is non-deterministic.
+
+### Setting up source
+
+We will use PubSub topic as a source to read the image names. 
+ * PubSub topic emits a `UTF-8` encoded model path that will be used read and 
preprocess images for running the inference.
+
+### Models for image segmentation
+
+For the purpose of this example, use models saved in 
[HDF5](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format) 
format. Initially, pass a model to the Tensorflow ModelHandler for predictions 
until there is an update via side input. 
+After a while, upload a model that matches the `file_pattern` to the GCS 
bucket. The bucket path will be used a glob pattern and is passed to the 
`WatchFilePattern`.
+Once there is an update, the RunInference PTransform will update the 
`model_uri` to use the latest model for inferences.
+
+### ModelHandler used for inference
+
+For the ModelHandler, we will be using 
[TFModelHandlerTensor](https://github.com/apache/beam/blob/186973b110d82838fb8e5ba27f0225a67c336591/sdks/python/apache_beam/ml/inference/tensorflow_inference.py#L184).
+```python
+from apache_beam.ml.inference.tensorflow_inference import TFModelHandlerTensor
+tf_model_handler = 
TFModelHandlerTensor(model_uri='gs://<your-bucket>/<model_path.h5>')
+``` 
+
+### Pre-processing image for inference
+The PubSub topic emits an image path. We need to read and preprocess the image 
to use it for RunInference. `read_image` function is used to read the image for 
inference.
+
+```python
+import io
+from PIL import Image
+from apache_beam.io.filesystems import FileSystems
+import numpy
+import tensorflow as tf
+
+def read_image(image_file_name):
+  with FileSystems().open(image_file_name, 'r') as file:
+    data = Image.open(io.BytesIO(file.read())).convert('RGB')  
+  img = data.resize((224, 224))
+  img = numpy.array(img) / 255.0
+  img_tensor = tf.cast(tf.convert_to_tensor(img[...]), dtype=tf.float32)
+  return img_tensor
+```
+
+Now, let's jump into the pipeline code.
+
+**Steps**:
+1. Get the image names from the PubSub topic.
+2. Read and pre-process the images using `read_image` function.
+3. Pass the images to the `RunInference` PTransform. RunInference takes 
`model_handler` and `model_metadata_pcoll`.
+   1. For the `model_handler`, `TFModelHandlerTensor` is used.
+   2. The `model_metadata_pcoll` is a [side 
input](https://beam.apache.org/documentation/programming-guide/#side-inputs) 
PCollection to the RunInference PTransform. This is used to update the models 
in the `model_handler` without needing to stop the beam pipeline. 
+      1. The `WatchFilePattern` is used as side input, which is used to watch 
a glob pattern matching `.h5` files. We use 
[HDF5](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format) 
standard to load the models.

Review Comment:
   Also, generally be careful about jumping back and forth between first person 
active verbs and passive verbs (e.g. `We use` vs `This is used`) - you should 
pick one (ideally active verbs) and stick with it.



##########
website/www/site/content/en/documentation/ml/side-input-updates.md:
##########
@@ -0,0 +1,138 @@
+---
+title: "Auto Model Updates in RunInference Transforms using SideInputs"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference 
Transform
+
+The pipeline in this example uses 
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
 PTransform with a `side input` PCollection that emits `ModelMetadata` to run 
inferences on images using open source Tensorflow models trained on `imagenet`.

Review Comment:
   It also might be worth defining side inputs actually.



##########
website/www/site/content/en/documentation/ml/side-input-updates.md:
##########
@@ -15,28 +15,25 @@ See the License for the specific language governing 
permissions and
 limitations under the License.
 -->
 
-# Use Slowly-Updating Side Input Pattern to Update Models in RunInference 
Transform
+# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference 
Transform
 
-The pipeline in this example uses RunInference PTransform with a `side input` 
PCollection that emits `ModelMetadata` to run inferences on images using open 
source Tensorflow models trained on `imagenet`.
+The pipeline in this example uses 
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
 PTransform with a `side input` PCollection that emits `ModelMetadata` to run 
inferences on images using open source Tensorflow models trained on `imagenet`.
 
 In this example, we will use `WatchFilePattern` as a side input. 
`WatchFilePattern` is used to watch for the file updates matching the 
`file_pattern`
-based on timestamps and emits the latest `ModelMetadata`, which is used in
-`RunInference` PTransform for the dynamic model updates without the need for 
stopping
-the beam pipeline.
+based on timestamps and emits the latest 
[ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/),
 which is used in
+`RunInference` PTransform for the dynamic auto model updates without the need 
for stopping the beam pipeline.
 
 **Note**: Slowly-updating side input pattern is non-deterministic.

Review Comment:
   I'd also add more info here on the possible behavior gaps you can observe 
(aka updates may not happen immediately)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm commented on a diff in pull request #25947: Add documentation for the auto model updates

Reply via email to