rszper commented on code in PR #25947: URL: https://github.com/apache/beam/pull/25947#discussion_r1149770971
########## website/www/site/content/en/documentation/ml/side-input-updates.md: ########## @@ -15,22 +15,22 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Use Slowly-Updating Side Input Pattern to Auto Update Models in RunInference Transform +# Use slowly-updating side input patterns to auto-update models -The pipeline in this example uses [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) PTransform with a `side input` PCollection that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. +The pipeline in this example uses a [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) `PTransform` with a side input `PCollection` that emits `ModelMetadata` to run inferences on images using open source Tensorflow models trained on `imagenet`. -In this example, we will use `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` -based on timestamps and emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in -`RunInference` PTransform for the dynamic auto model updates without the need for stopping the beam pipeline. +This example uses `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` +based on timestamps. It emits the latest [ModelMetadata](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/), which is used in +the RunInference `PTransform` to dynamically update the model without stopping the Beam pipeline. -**Note**: Slowly-updating side input pattern is non-deterministic. +**Note**: Slowly-updating side input patterns are non-deterministic. ### Setting up source -We will use PubSub topic as a source to read the image names. - * PubSub topic emits a `UTF-8` encoded model path that will be used read and preprocess images for running the inference. +To read the image names, use a Pub/Sub topic as the source. + * The Pub/Sub topic emits a `UTF-8` encoded model path that is used to read and preprocess images to run the inference. -### Models for image segmentation +## Models for image segmentation For the purpose of this example, use models saved in [HDF5](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format) format. Initially, pass a model to the Tensorflow ModelHandler for predictions until there is an update via side input. After a while, upload a model that matches the `file_pattern` to the GCS bucket. The bucket path will be used a glob pattern and is passed to the `WatchFilePattern`. Review Comment: GCS bucket should be Google Cloud Storage bucket. Also, the second sentence has a grammatical error. I think it should be: The bucket path will be used as a glob pattern and is passed to the `WatchFilePattern`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
