damccorm commented on code in PR #30351:
URL: https://github.com/apache/beam/pull/30351#discussion_r1518265761
##########
learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md:
##########
@@ -0,0 +1,47 @@
+Prompt:
+How can I use custom remote inference calls in my Apache Beam pipeline?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline
is by leveraging the RunInference API provided by the Apache Beam Python SDK.
This feature allows you to seamlessly incorporate ML models into your pipeline
or execute remote inference calls.
+
+When developing custom inference calls, consider the following factors:
+* **API quotas**: heavy loads may lead to reaching API quota limits. You can
use `PipelineOptions` to specify the maximum number of parallel API calls. Use
`direct_num_workers` for the Direct Runner or `max_num_workers` for the Google
Cloud Dataflow Runner. Refer to the Beam Capability Matrix for information
about other runners.
+* **Error handling**: it is essential to handle errors in case of API call
failures. Consider implementing exponential backoff for retries or utilizing
dead-letter queues for failed API calls.
+* **Monitoring**: incorporate monitoring and performance metrics to track the
performance of your inference calls and the health of your pipeline.
+* **Batching**: batching can be used to send multiple inputs in a single API
call for improved efficiency.
+
+To execute external API calls with the `RunInference` transform, you will need
to create a `beam.DoFn` in the form of a custom model handler:
+
+```python
+class CustomModelHandler(ModelHandler):
+ """DoFn that accepts a batch of inputs and sends that batch to the remote
API for inference"""
+
+ def load_model(self):
+ """Initiate the Custom remote API client."""
+ client = ... # Initialize the client
+ return client
+
+ def run_inference(self, batch, model, inference):
+
+ # Prepare a batch request for all inputs in the batch.
+ inputs = ... # Process inputs from the batch
+ input_requests = ... # Prepare input requests for the model
+ batch_request = ... # Prepare batch request for the model
+
+ # Send the batch request to the remote endpoint.
+ responses = model.(request=batch_request).responses
Review Comment:
I don't think this makes sense, could we update to something like:
```
responses = client.remote_call(request=batch_request).responses
```
and rename `model` to `client` above?
##########
learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,22 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open-source, unified programming model for defining batch
and streaming data-parallel processing pipelines. Each Apache Beam pipeline is
represented as a Directed Acyclic Graph (DAG) of data processing operations
constructed using the Apache Beam SDK in one of the supported languages. To
execute a pipeline, you need to deploy it to one of the supported Beam runners.
+
+You can use Apache Beam for various tasks within your AI/ML pipeline,
including data validation, preprocessing, model validation, and model
deployment and inference.
+
+Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow
for reading and writing data from and to various data sources and sinks, as
well as performing data validation.
+
+For data preprocessing, Apache Beam provides the `MLTransform` class. This
feature allows you to ensure data consistency by applying the same
preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from
PyTorch, Scikit-learn, and TensorFlow.
+
+To execute machine learning inference tasks, Apache Beam provides the
RunInference API. `RunInference` is a `PTransform` that supports the usage of
any custom models. For an example of running inference on a custom model loaded
with spaCy, refer to the Apache Beam GitHub repository.
+
+The `RunInference` transform efficiently handles models of any size, making it
suitable for large language models (LLMs) and other complex architectures. You
can find an example of deploying and performing inference on large language
models (LLMs) in the 'RunInference' section of the Apache Beam documentation.
+
+For more information on implementing AI/ML pipelines using Apache Beam, see
the 'Get started with AI/ML pipelines' section in the Apache Beam documentation.
+
+
+
Review Comment:
```suggestion
For more information on implementing AI/ML pipelines using Apache Beam, see
the 'Get started with AI/ML pipelines' section in the Apache Beam documentation.
```
Nit: remove trailing whitespace
##########
learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md:
##########
@@ -0,0 +1,50 @@
+Prompt:
+How can I integrate BigQuery ML with Apache Beam?
+
+Response:
+BigQuery ML allows you to develop and execute ML models using GoogleSQL
queries. You can seamlessly integrate these models into your Apache Beam
pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on
them using the RunInference API.
+
+For an example of training a basic BigQuery ML model, refer to the BigQuery
documentation.
+
+Once you have trained your model, you will need to export it. Here is an
example BigQuery command to export a model to a Google Cloud Storage bucket:
+
+```
+bq extract -m <model name> gs://<cloud storage path>
+```
+
+To incorporate your BigQuery ML model into an Apache Beam pipeline using
`tfx_bsl`, it must be saved in the TensorFlow SavedModel format. You will need
to download the model to your local directory to perform local predictions:
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.python.framework import tensor_util
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+inputs = tf.train.Example(...)
+
+model_path = <path to the directory where the model is stored>
+
+def extract_prediction(response):
+ # Extract the prediction from the response depending on the signature of
the model
+
+with beam.Pipeline() as p:
+ res = (
+ p
+ | beam.Create([inputs])
+ | RunInference(
+ model_spec_pb2.InferenceSpecType(
+ saved_model_spec=model_spec_pb2.SavedModelSpec(
+ model_path=model_path,
+ signature_name=['serving_default'])))
+ | beam.ParDo(extract_prediction)
+ )
+```
+
+This example uses the `RunInference` transform from the `tfx_bsl` library,
directing it to the local directory where the model is stored. The transform
takes `tf.train.Example` type elements as inputs and produces
`tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as
outputs. Depending on the signature of your model, you can extract values from
the output. For the prediction output format of exported models for each model
type, refer to the 'Prediction output format' section in the BigQuery
documentation.
+
+
+
+
Review Comment:
```suggestion
This example uses the `RunInference` transform from the `tfx_bsl` library,
directing it to the local directory where the model is stored. The transform
takes `tf.train.Example` type elements as inputs and produces
`tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as
outputs. Depending on the signature of your model, you can extract values from
the output. For the prediction output format of exported models for each model
type, refer to the 'Prediction output format' section in the BigQuery
documentation.
```
Spacing nit
##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,37 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models
within your pipelines through the RunInference API. This functionality is
available in the Python SDK versions 2.40.0 and later. The Java SDK versions
2.41.0 and later also support the API through Apache Beam’s Multi-language
Pipelines framework. The `RunInference` transform performs inference on a
`PCollection` of examples using an ML model and outputs a `PCollection`
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs,
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you will need to configure
the `ModelHandler` object, which wraps the underlying model and allows you to
set necessary environment variables for inference.
+
+Here is an example of importing a PyTorch model handler for use in your
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import
PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+ model_handler = PytorchModelHandlerTensor(
+ # Model handler setup
+ )
+
+with pipeline as p:
+ predictions = p | 'Read' >> beam.ReadFromSource('a_source')
+ | 'RunInference' >> RunInference(model_handler)
Review Comment:
```suggestion
predictions = p | 'Read' >> beam.ReadFromSource('a_source')
| 'RunInference' >> RunInference(model_handler)
```
Spacing nit
##########
learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
Review Comment:
```suggestion
How can I use Apache Beam to run inference on Large Language Models (LLMs)?
```
##########
learning/prompts/documentation-lookup/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,23 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open-source, unified programming model for defining batch
and streaming data-parallel processing pipelines. Each Apache Beam pipeline is
represented as a Directed Acyclic Graph (DAG) of data processing operations
constructed using the Apache Beam SDK in one of the [supported
languages](https://beam.apache.org/documentation/sdks/java/). To execute a
pipeline, you need to deploy it to one of the supported [Beam
runners](https://beam.apache.org/documentation/runners/capability-matrix/).
+
+You can use Apache Beam for various tasks within your AI/ML pipeline,
including data validation, preprocessing, model validation, and model
deployment and inference.
+
+Apache Beam offers a rich set of [I/O
connectors](https://beam.apache.org/documentation/io/connectors/) and
[transforms](https://beam.apache.org/documentation/transforms/python/) that
allow for reading and writing data from and to various data sources and sinks,
as well as performing data validation.
+
+For data preprocessing, Apache Beam provides the
[MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) class.
This feature allows you to ensure data consistency by applying the same
preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from
[PyTorch](https://pytorch.org/),
[Scikit-learn](https://scikit-learn.org/stable/), and
[TensorFlow](https://www.tensorflow.org/).
+
+To execute machine learning inference tasks, Apache Beam provides the
RunInference API.
+[`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
is a [`PTransform`](https://beam.apache.org/documentation/basics/#ptransform)
that supports the usage of any custom models. For an example of running
inference on a custom model loaded with [spaCy](https://spacy.io/), refer to
the [Bring your own ML model to Beam
RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb)
example in the Apache Beam GitHub repository.
+
+The `RunInference` transform efficiently handles models of any size, making it
suitable for large language models (LLMs) and other complex architectures. You
can find an example of deploying and performing inference on large language
models (LLMs) in the
[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/)
section of the Apache Beam documentation.
+
+For more information on implementing AI/ML pipelines using Apache Beam, see
the [Get started with AI/ML
pipelines](https://beam.apache.org/documentation/ml/overview/) section in the
Apache Beam documentation.
+
+
+
Review Comment:
```suggestion
For more information on implementing AI/ML pipelines using Apache Beam, see
the [Get started with AI/ML
pipelines](https://beam.apache.org/documentation/ml/overview/) section in the
Apache Beam documentation.
```
Spacing nit
##########
learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
Review Comment:
I think we probably need to rework this response - this mostly just
describes how to run inference generally, and doesn't have much content on
large models. I see this was largely pulled from
https://beam.apache.org/documentation/ml/large-language-modeling/ and that
needs updated as well, I added https://github.com/apache/beam/pull/30585 -
could you pull in the info from that PR here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]