[GitHub] [beam] damccorm commented on a diff in pull request #25947: Add documentation for the auto model updates

via GitHub Thu, 23 Mar 2023 12:16:43 -0700


damccorm commented on code in PR #25947:
URL: https://github.com/apache/beam/pull/25947#discussion_r1146708146



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.

Review Comment:
   `It is used to load the model for inference.` - could you make it explicit 
that this should be in the same format that the given framework expects for a 
saved model file (e.g. state_dict for PyTorch)?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to 
identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow 
[AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton)
 view or the pipeline will result in error.

Review Comment:
   Also, maybe you could link to the Watch Transform that you created itself as 
a pattern to follow?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to 
identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow 
[AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton)
 view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to 
receive inputs, the main PCollection will get buffered until there is
+            an update to the side input. This could happen with Global 
windowed side inputs with data driven triggers such as `AfterCount`, 
`AfterProcessingTime`. So until there is an update to the side input, emit the 
default/initial model id that is used to pass the respective `ModelHandler` as 
side input..

Review Comment:
   ```suggestion
               an update to the side input. This could happen with Global 
windowed side inputs with data driven triggers such as `AfterCount`, 
`AfterProcessingTime`. So until there is an update to the side input, emit the 
default/initial model id that is used to pass the respective `ModelHandler` as 
side input.
   ```
   
   nit



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` 
documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline 
that uses ML models, see the
 [Example RunInference API 
pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference)
 on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of 
[ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata),
 which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform 
without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a 
URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to 
identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow 
[AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton)
 view or the pipeline will result in error.

Review Comment:
   This is probably hard for the average developer to understand - could we 
lead with the WatchTransform pattern (ideally with a small code snippet) and 
then include the generalized ModelMetadata approach as something like "if just 
using a file watch transform doesn't work for you"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm commented on a diff in pull request #25947: Add documentation for the auto model updates

Reply via email to