rszper commented on code in PR #25947: URL: https://github.com/apache/beam/pull/25947#discussion_r1148027333
########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. Review Comment: ```suggestion The pipeline in this example uses a [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) `PTransform` with a side input `PCollection` that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. Review Comment: ```suggestion the RunInference `PTransform` to dynamically update the model without stopping the Beam pipeline. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform Review Comment: ```suggestion # Use slowly-updating side input patterns to auto-update models ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -128,5 +121,15 @@ class PostProcessor(beam.DoFn): predicted_class_name = imagenet_labels[predicted_class] return predicted_class_name.title(), element.model_id -(inference_pcoll | "PostProcessor" >> PostProcessor()) +post_processor_pcoll = (inference_pcoll | "PostProcessor" >> PostProcessor()) ``` + +### Run the pipeline +```python +result = pipeline.run().wait_until_finish() +``` +Once the pipeline is run with initial settings, upload a model matching the `file_pattern` to GCS bucket. After some time, you will see that your pipeline starts to use the updated model instead of the initial model. +**Note**: `model_name` of the `ModelMetaData` object will be attached as prefix to the [metrics](https://beam.apache.org/documentation/ml/runinference-metrics/) calculated by the RunInference PTransform Review Comment: ```suggestion **Note**: The `model_name` of the `ModelMetaData` object is attached as prefix to the [metrics](https://beam.apache.org/documentation/ml/runinference-metrics/) calculated by the RunInference `PTransform`. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -128,5 +121,15 @@ class PostProcessor(beam.DoFn): predicted_class_name = imagenet_labels[predicted_class] return predicted_class_name.title(), element.model_id -(inference_pcoll | "PostProcessor" >> PostProcessor()) +post_processor_pcoll = (inference_pcoll | "PostProcessor" >> PostProcessor()) ``` + +### Run the pipeline +```python +result = pipeline.run().wait_until_finish() +``` +Once the pipeline is run with initial settings, upload a model matching the `file_pattern` to GCS bucket. After some time, you will see that your pipeline starts to use the updated model instead of the initial model. +**Note**: `model_name` of the `ModelMetaData` object will be attached as prefix to the [metrics](https://beam.apache.org/documentation/ml/runinference-metrics/) calculated by the RunInference PTransform + +## Final remarks +Use this example as a pattern on how to use side inputs with RunInference PTransform to auto update the models without the need to stop the pipeline. A similar example for PyTorch can be found on [GitHub](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification_with_side_inputs.py). Review Comment: ```suggestion Use this example as a pattern when using side inputs with the RunInference `PTransform` to auto-update the models without stopping the pipeline. You can see a similar example for PyTorch on [GitHub](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification_with_side_inputs.py). ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in Review Comment: ```suggestion based on timestamps. It emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. We will use PubSub topic as a source to read the image names. Review Comment: ```suggestion To read the image names, use a Pub/Sub topic as the source. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` Review Comment: ```suggestion This example uses `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. Review Comment: ```suggestion **Note**: Slowly-updating side input patterns are non-deterministic. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. Review Comment: ```suggestion ## Set up the source ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. We will use PubSub topic as a source to read the image names. * PubSub topic emits a `UTF-8` encoded model path that will be used read and preprocess images for running the inference. -## Models for image segmentation +### Models for image segmentation -We will use `resnet_v2_101` for initial predictions. After a while, we will upload a `resnet_v2_152` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the WatchFilePattern. - +We will use `resnet_v2_101` for initial predictions. After a while, upload a model that matches the `file_pattern` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the WatchFilePattern. Review Comment: ```suggestion For initial predictions, use `resnet_v2_101`. Upload a model that matches the `file_pattern` to the Google Cloud Storage bucket. The bucket path is used as a glob pattern and is passed to `WatchFilePattern`. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. We will use PubSub topic as a source to read the image names. * PubSub topic emits a `UTF-8` encoded model path that will be used read and preprocess images for running the inference. Review Comment: ```suggestion * The Pub/Sub topic emits a `UTF-8` encoded model path that is used to read and preprocess images to run the inference. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. We will use PubSub topic as a source to read the image names. * PubSub topic emits a `UTF-8` encoded model path that will be used read and preprocess images for running the inference. -## Models for image segmentation +### Models for image segmentation Review Comment: ```suggestion ## Models for image segmentation ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. We will use PubSub topic as a source to read the image names. * PubSub topic emits a `UTF-8` encoded model path that will be used read and preprocess images for running the inference. -## Models for image segmentation +### Models for image segmentation -We will use `resnet_v2_101` for initial predictions. After a while, we will upload a `resnet_v2_152` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the WatchFilePattern. - +We will use `resnet_v2_101` for initial predictions. After a while, upload a model that matches the `file_pattern` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the WatchFilePattern. +Once there is an update, the RunInference PTransform will update the `model_uri` to use the latest model/file. Review Comment: ```suggestion After the update, the RunInference `PTransform` updates the `model_uri` to use the latest model and file. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,28 +15,25 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Update Models in RunInference Transform +# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform -The pipeline in this example uses RunInference PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest `ModelMetadata`, which is used in -`RunInference` PTransform for the dynamic model updates without the need for stopping -the beam pipeline. +based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. **Note**: Slowly-updating side input pattern is non-deterministic. -You can find the code used in this example in the [Beam repository] (link). - -## Setting up source. +### Setting up source. We will use PubSub topic as a source to read the image names. * PubSub topic emits a `UTF-8` encoded model path that will be used read and preprocess images for running the inference. -## Models for image segmentation +### Models for image segmentation -We will use `resnet_v2_101` for initial predictions. After a while, we will upload a `resnet_v2_152` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the WatchFilePattern. - +We will use `resnet_v2_101` for initial predictions. After a while, upload a model that matches the `file_pattern` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the WatchFilePattern. +Once there is an update, the RunInference PTransform will update the `model_uri` to use the latest model/file. ### ModelHandler used for Predictions. Review Comment: ```suggestion ## ModelHandler for predictions ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -128,5 +121,15 @@ class PostProcessor(beam.DoFn): predicted_class_name = imagenet_labels[predicted_class] return predicted_class_name.title(), element.model_id -(inference_pcoll | "PostProcessor" >> PostProcessor()) +post_processor_pcoll = (inference_pcoll | "PostProcessor" >> PostProcessor()) ``` + +### Run the pipeline +```python +result = pipeline.run().wait_until_finish() +``` +Once the pipeline is run with initial settings, upload a model matching the `file_pattern` to GCS bucket. After some time, you will see that your pipeline starts to use the updated model instead of the initial model. Review Comment: ```suggestion After you run the pipeline with the initial settings, upload a model matching the `file_pattern` to the Google Cloud Storage bucket. Your pipeline will use the updated model instead of the initial model. ``` ########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -128,5 +121,15 @@ class PostProcessor(beam.DoFn): predicted_class_name = imagenet_labels[predicted_class] return predicted_class_name.title(), element.model_id -(inference_pcoll | "PostProcessor" >> PostProcessor()) +post_processor_pcoll = (inference_pcoll | "PostProcessor" >> PostProcessor()) ``` + +### Run the pipeline Review Comment: ```suggestion ## Run the pipeline ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
