[GitHub] [beam] rszper commented on a diff in pull request #25947: Add documentation for the auto model updates

via GitHub Thu, 23 Mar 2023 14:43:17 -0700


rszper commented on code in PR #25947:
URL: https://github.com/apache/beam/pull/25947#discussion_r1146891464



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler

Review Comment:
   Is slowly updating a type of side input pattern? If so, I think it should be 
hyphenated:
   
   Slowly-updating side input pattern



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to 
identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow 
[AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton)
 view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to 
receive inputs, the main PCollection will get buffered until there is

Review Comment:
   ```suggestion
   **Note**: If the main PCollection emits inputs and a side input has yet to 
receive inputs, the main PCollection is buffered until there is
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to 
identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow 
[AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton)
 view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to 
receive inputs, the main PCollection will get buffered until there is

Review Comment:
   We generally don't want two notes in a row. Does this content need to be in 
a note? Can it just be text in the file? It both sections must be notes, could 
they be combined into one note?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to 
identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow 
[AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton)
 view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to 
receive inputs, the main PCollection will get buffered until there is
+            an update to the side input. This could happen with Global 
windowed side inputs with data driven triggers such as `AfterCount`, 
`AfterProcessingTime`. So until there is an update to the side input, emit the 
default/initial model id that is used to pass the respective `ModelHandler` as 
side input..

Review Comment:
   ```suggestion
               an update to the side input. This could happen with Global 
windowed side inputs with data driven triggers such as `AfterCount`, 
`AfterProcessingTime`. Until the side input is updated, emit the default or 
initial model ID that is used to pass the respective `ModelHandler` as a side 
input.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,

Review Comment:
   ```suggestion
   To update models used by the model handler in the RunInference PTransform 
without stopping the pipeline, use a 
[`ModelMetadata`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata)
 side input, which is a `NamedTuple` containing the `model_id` and `model_name`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] rszper commented on a diff in pull request #25947: Add documentation for the auto model updates

Reply via email to