dariabezkorovaina commented on code in PR #30351:
URL: https://github.com/apache/beam/pull/30351#discussion_r1513284149


##########
learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with TFX?
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both 
batch and streaming data-parallel processing pipelines.Each Apache Beam 
pipeline is a Directed Acyclic Graph (DAG) of data processing operations 
constructed using Apache Beam SDK in one of the supported languages.
+
+To run a pipeline you need to deploy it to one of the supported Beam runners.
+
+You can use Apache Beam for various tasks within your AI/ML pipeline, 
including data validation, preprocessing, model validation, and model 
deployment and inference.
+
+Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow 
for reading and writing data from and to various data sources and sinks, as 
well as performing data validation.
+
+For data preprocessing, Apache Beam provides the `MLTransform` class. This 
feature allows you to ensure data consistency by applying the same 
preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from 
`PyTorch`, `Scikit-learn`, and `TensorFlow`.
+
+To execute machine learning inference tasks, Apache Beam provides the 
RunInference API.
+`RunInference` is a `PTransform` that supports the usage of any custom models. 
For an example of running inference on a custom model loaded with `spaCy`, 
refer to examples in the Apache Beam GitHub repository.

Review Comment:
   ```suggestion
   To execute machine learning inference tasks, Apache Beam provides the 
RunInference API. `RunInference` is a `PTransform` that supports the usage of 
any custom models. For an example of running inference on a custom model loaded 
with spaCy, refer to the Apache Beam GitHub repository.
   ```



##########
learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both 
batch and streaming data-parallel processing pipelines.Each Apache Beam 
pipeline is a Directed Acyclic Graph (DAG) of data processing operations 
constructed using Apache Beam SDK in one of the supported languages.
+
+To run a pipeline you need to deploy it to one of the supported Beam runners.

Review Comment:
   ```suggestion
   Apache Beam is an open-source, unified programming model for defining batch 
and streaming data-parallel processing pipelines. Each Apache Beam pipeline is 
represented as a Directed Acyclic Graph (DAG) of data processing operations 
constructed using the Apache Beam SDK in one of the supported languages. To 
execute a pipeline, you need to deploy it to one of the supported Beam runners.
   ```



##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework. The `RunInference` transform performs inference on a 
`PCollection` of examples using an ML model and outputs a `PCollection` 
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow 
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate 
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
`ModelHandler` object, which wraps the underlying model and allows you to set 
necessary environment variables for inference.

Review Comment:
   ```suggestion
   To import a model into your Apache Beam pipeline, you will need to configure 
the `ModelHandler` object, which wraps the underlying model and allows you to 
set necessary environment variables for inference.
   ```



##########
learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md:
##########
@@ -0,0 +1,38 @@
+Prompt:
+What metrics are available for monitoring the performance of ML model 
inference in Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework.
+
+The RunInference API provides built-in support for monitoring the performance 
of ML model inference. Here is a list of commonly used metrics for inference 
benchmarking:
+
+* `model_byte_size`: size of the memory footprint of the model load and 
initialization.
+* `load_model_latency_milli_secs`: time taken to load and initialize the model.
+* `num_inferences`: total number of elements passed to `run_inference()`.
+* `inference_batch_latency_micro_secs`: time taken to perform inference across 
all batches of examples.
+* `inference_request_batch_byte_size`: size of the batch in bytes.
+* `inference_request_batch_size`: number of examples in the batch.
+* `failed_batches_counter`: number of failed batches.
+
+You can derive additional metrics from the ones listed above. Example:
+Total time taken for inference = `num_inferences` x 
`inference_batch_latency_micro_secs_MEAN`
+
+Here is a simplified example of how to use the RunInference API to perform 
inference on a language model (LM):
+
+```python
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "Create inputs" >> beam.Create(<INPUTS>>)
+        | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>))
+        | "Inference" >> RunInference(<MODEL_HANDLER>))
+```
+After running the pipeline, you can query the metrics using the following code:

Review Comment:
   ```suggestion
   
   After running the pipeline, you can query the metrics using the following 
code:
   ```



##########
learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the RunInference API provided by the Apache Beam SDK. This 
feature allows you to seamlessly incorporate ML models into your pipeline or 
execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit 
into memory.

Review Comment:
   ```suggestion
   You can use the `RunInference` transform with large models as long as they 
fit into memory.
   ```



##########
learning/prompts/documentation-lookup/42_ai_custom_inference.md:
##########
@@ -0,0 +1,47 @@
+Prompt:
+How can I use custom remote inference calls in my Apache Beam pipeline?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.

Review Comment:
   ```suggestion
   The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam Python SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.
   ```



##########
learning/prompts/documentation-lookup/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit 
into memory.
+
+The typical workflow for conducting inference on Large Language Models (LLMs) 
within an Apache Beam pipeline involves the following steps:
+1. Read the input text data from a source such as a file or a Pub/Sub topic.
+2. Encode the text into tokens understandable by the LLM model, typically 
using a tokenizer.
+3. Use the `RunInference` transform to obtain predictions from the model.
+4. Decode the predictions into human-readable text.
+
+Here is an example demonstrating how to leverage the RunInference API with 
LLMs in an Apache Beam pipeline:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+    )
+```
+
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible 
for encoding and decoding the input and output data using a tokenizer. The 
`model_handler` is a `ModelHandler` object that wraps the underlying model and 
allows you to configure environment variables required to load the model.
+
+Here is an example of how to configure a Pytorch model handler for use in your 
pipeline:

Review Comment:
   ```suggestion
   Here is an example of how to configure a PyTorch model handler for use in 
your pipeline:
   ```



##########
learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both 
batch and streaming data-parallel processing pipelines.Each Apache Beam 
pipeline is a Directed Acyclic Graph (DAG) of data processing operations 
constructed using Apache Beam SDK in one of the supported languages.
+
+To run a pipeline you need to deploy it to one of the supported Beam runners.
+
+You can use Apache Beam for various tasks within your AI/ML pipeline, 
including data validation, preprocessing, model validation, and model 
deployment and inference.
+
+Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow 
for reading and writing data from and to various data sources and sinks, as 
well as performing data validation.
+
+For data preprocessing, Apache Beam provides the `MLTransform` class. This 
feature allows you to ensure data consistency by applying the same 
preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from 
`PyTorch`, `Scikit-learn`, and `TensorFlow`.
+
+To execute machine learning inference tasks, Apache Beam provides the 
RunInference API.
+`RunInference` is a `PTransform` that supports the usage of any custom models. 
For an example of running inference on a custom model loaded with `spaCy`, 
refer to examples in the Apache Beam GitHub repository.
+
+The `RunInference` transform efficiently handles models of any size, making it 
suitable for large language models (LLMs) and other complex architectures. You 
can find an example of deploying and performing inference on large language 
models (LLMs) in the RunInference section of the Apache Beam documentation.

Review Comment:
   ```suggestion
   The `RunInference` transform efficiently handles models of any size, making 
it suitable for large language models (LLMs) and other complex architectures. 
You can find an example of deploying and performing inference on large language 
models (LLMs) in the 'RunInference' section of the Apache Beam documentation.
   ```



##########
learning/prompts/documentation-lookup/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit 
into memory.
+
+The typical workflow for conducting inference on Large Language Models (LLMs) 
within an Apache Beam pipeline involves the following steps:
+1. Read the input text data from a source such as a file or a Pub/Sub topic.
+2. Encode the text into tokens understandable by the LLM model, typically 
using a tokenizer.
+3. Use the `RunInference` transform to obtain predictions from the model.
+4. Decode the predictions into human-readable text.
+
+Here is an example demonstrating how to leverage the RunInference API with 
LLMs in an Apache Beam pipeline:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+    )
+```
+
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible 
for encoding and decoding the input and output data using a tokenizer. The 
`model_handler` is a `ModelHandler` object that wraps the underlying model and 
allows you to configure environment variables required to load the model.
+
+Here is an example of how to configure a Pytorch model handler for use in your 
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
+model_handler = PytorchModelHandlerTensor(
+    state_dict_path=known_args.model_state_dict_path,
+    model_class=T5ForConditionalGeneration,
+    model_params={
+        "config": AutoConfig.from_pretrained(known_args.model_name)
+    },
+    device="cpu",
+    inference_fn=gen_fn)
+```
+
+Each specific model handler has its own configuration parameters. For example, 
the `PytorchModelHandlerTensor` requires the following parameters:
+* `state_dict_path`: the path to the saved model state dictionary.
+* `model_class`: the class of the Pytorch model that defines model structure.

Review Comment:
   ```suggestion
   * `model_class`: the class of the PyTorch model that defines model structure.
   ```



##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework. The `RunInference` transform performs inference on a 
`PCollection` of examples using an ML model and outputs a `PCollection` 
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow 
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate 
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
`ModelHandler` object, which wraps the underlying model and allows you to set 
necessary environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your 
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+  model_handler = PytorchModelHandlerTensor(
+    # Model handler setup
+  )
+
+with pipeline as p:
+    predictions = p |  'Read' >> beam.ReadFromSource('a_source')
+                    | 'RunInference' >> RunInference(model_handler)
+```
+
+For comprehensive end-to-end examples of inference with supported model 
frameworks and model hubs, refer to the Apache Beam GitHub repository.
+

Review Comment:
   Combined all references to the GitHub repository into the final paragraph



##########
learning/prompts/documentation-lookup/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,24 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both 
batch and streaming data-parallel processing pipelines.Each Apache Beam 
pipeline is a Directed Acyclic Graph (DAG) of data processing operations 
constructed using Apache Beam SDK in one of the [supported 
languages](https://beam.apache.org/documentation/sdks/java/).
+To run a pipeline you need to deploy it to one of the supported [Beam 
runners](https://beam.apache.org/documentation/runners/capability-matrix/).

Review Comment:
   ```suggestion
   Apache Beam is an open-source, unified programming model for defining batch 
and streaming data-parallel processing pipelines. Each Apache Beam pipeline is 
represented as a Directed Acyclic Graph (DAG) of data processing operations 
constructed using the Apache Beam SDK in one of the [supported 
languages](https://beam.apache.org/documentation/sdks/java/). To execute a 
pipeline, you need to deploy it to one of the supported [Beam 
runners](https://beam.apache.org/documentation/runners/capability-matrix/).
   ```



##########
learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md:
##########
@@ -0,0 +1,42 @@
+Prompt:
+How can I make sure my Apache Beam pipeline uses the latest version of the ML 
model?
+
+Response:
+Production ML workflows often involve updating the ML model with new data. You 
can ensure that your Apache Beam pipeline uses the latest version of the ML 
model by leveraging the RunInference API alongside Apache Beam's side inputs 
feature. Side inputs are additional inputs that can be provided to a ‘ParDo’ 
transform, in addition to the main input `PCollection`.
+
+The `RunInference` transform accepts an optional input parameter 
`model_metadata_pcoll`, which is a side input `PCollection` emitting 
`ModelMetadata`. `ModelMetadata` is a `NamedTuple` object containing `model_id` 
and `model_name`, used to load the model for inference and identify it in the 
metrics generated by the `RunInference` transform. The URL or path to the model 
should be compatible with the respective `ModelHandler` requirements.
+
+If the main collection emits inputs before the `model_metadata_pcoll` side 
input is available, the main `PCollection` will be buffered until the 
`model_metadata_pcoll` side input is emitted.
+
+For more information on `ModelMetadata`, refer to the Apache Beam Python SDK 
documentation.
+
+A common approach to model updates in production is to use a 
`FileWatchPattern` as a side input:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.utils import WatchFilePattern
+from apache_beam.ml.inference.base import RunInference
+
+tf_model_handler = ... # model handler for the model
+
+with beam.Pipeline() as pipeline:
+
+  file_pattern = '<path_to_model_file>'
+
+  side_input_pcoll = (
+    pipeline
+    | "FilePatternUpdates" >> WatchFilePattern(file_pattern=file_pattern))
+
+  main_input_pcoll = ... # main input PCollection
+
+  inference_pcoll = (
+    main_input_pcoll
+    | "RunInference" >> RunInference(
+    model_handler=model_handler,
+    model_metadata_pcoll=side_input_pcoll))
+```
+
+In the provided example, the `model_metadata_pcoll` parameter expects a 
`PCollection` of `ModelMetadata` compatible with the `AsSingleton` marker. 
Given that the pipeline employs the `WatchFilePattern` class as a side input, 
it automatically manages windowing and encapsulates the output into 
`ModelMetadata`.
+
+For more information, refer to the `WatchFilePattern` section in the Apache 
Beam documentation.

Review Comment:
   ```suggestion
   For more information, refer to the section 'Use `WatchFilePattern` to 
auto-update ML models in RunInference' in the Apache Beam documentation.
   ```



##########
learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with TFX?
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with TFX?
+
+
+Response:
+You can use Apache Beam for various tasks within your machine learning 
project, including data validation, preprocessing, model validation, and model 
deployment and inference.
+
+However, a comprehensive ML workflow also involves additional steps like data 
exploration, feature engineering, and model training. Additionally, ML 
workflows require metadata and artifact tracking for reproducibility and 
auditability. Popular tools for building and managing ML workflows include 
[Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), 
and [Apache Airflow](https://airflow.apache.org/).
+
+TFX is an end-to-end platform designed for deploying production ML pipelines. 
TFX pipelines consist of a sequence of components that together form an ML 
pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively 
supported orchestrators include [Kubeflow 
Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Apache 
Airflow](https://airflow.apache.org/), and Apache Beam itself.
+
+To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow 
these steps:
+1. Create TFX components by specifying interfaces and containerizing the 
component implementation.
+2. Create TFX pipeline by connecting these components and specifying how 
inputs and outputs are exchanged between them.

Review Comment:
   ```suggestion
   2. Create the TFX pipeline by connecting these components and specifying how 
inputs and outputs are exchanged between them.
   ```



##########
learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+How can I orchestrate Apache Beam pipelines with TFX?
+
+
+Response:
+You can use Apache Beam for various tasks within your machine learning 
project, including data validation, preprocessing, model validation, and model 
deployment and inference.
+
+However, a comprehensive ML workflow also involves additional steps like data 
exploration, feature engineering, and model training. Additionally, ML 
workflows require metadata and artifact tracking for reproducibility and 
auditability. Popular tools for building and managing ML workflows include 
Kubeflow, TFX, and Apache Airflow.
+
+TFX is an end-to-end platform designed for deploying production ML pipelines. 
TFX pipelines consist of a sequence of components that together form an ML 
pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively 
supported orchestrators include Kubeflow Pipelines, Apache Airflow, and Apache 
Beam itself.
+
+To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow 
these steps:
+1. Create TFX components by specifying interfaces and containerizing the 
component implementation.
+2. Create TFX pipeline by connecting these components and specifying how 
inputs and outputs are exchanged between them.

Review Comment:
   ```suggestion
   2. Create the TFX pipeline by connecting these components and specifying how 
inputs and outputs are exchanged between them.
   ```



##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework. The `RunInference` transform performs inference on a 
`PCollection` of examples using an ML model and outputs a `PCollection` 
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow 
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate 
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
`ModelHandler` object, which wraps the underlying model and allows you to set 
necessary environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your 
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+  model_handler = PytorchModelHandlerTensor(
+    # Model handler setup
+  )
+
+with pipeline as p:
+    predictions = p |  'Read' >> beam.ReadFromSource('a_source')
+                    | 'RunInference' >> RunInference(model_handler)
+```
+
+For comprehensive end-to-end examples of inference with supported model 
frameworks and model hubs, refer to the Apache Beam GitHub repository.
+
+If you need to run inference on a model that isn't explicitly supported, you 
can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to 
load and use your model. For an example of running inference on a custom model 
loaded with spaCy, refer to the examples in the Apache Beam GitHub repository.
+
+For recommended patterns and best practices when leveraging Apache Beam for 
inference tasks, see the RunInference Patterns section in the official 
documentation.
+
+For an example of using the RunInference API in the Java SDK, see the example 
multi-language pipelines in the Apache Beam GitHub repository. Additionally, 
for an illustration of a composite Python transform integrating the 
RunInference API with preprocessing and postprocessing from a Beam Java SDK 
pipeline, you can refer to the Apache Beam official documentation.

Review Comment:
   ```suggestion
   For comprehensive end-to-end examples of inference with supported model 
frameworks and model hubs, including examples of using the RunInference API in 
the Java SDK, see the Apache Beam GitHub repository. Additionally, for an 
illustration of a composite Python transform integrating the RunInference API 
with preprocessing and postprocessing from a Beam Java SDK pipeline, you can 
refer to the Apache Beam official documentation.
   ```



##########
learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md:
##########
@@ -0,0 +1,51 @@
+Prompt:
+How can I integrate BigQuery ML with Apache Beam?
+
+Response:
+BigQuery ML allows you to develop and execute ML models using GoogleSQL 
queries. You can seamlessly integrate these models into your Apache Beam 
pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on 
them using the RunInference API.
+
+For an example of training a basic BigQuery ML model, refer to the BigQuery 
documentation.
+
+Once you have trained your model, you will need to export it. Here is an 
example BigQuery command to export a model to a Google Cloud Storage bucket:
+
+```
+bq extract -m <model name> gs://<cloud storage path>
+```
+
+To incorporate your BigQuery ML model into an Apache Beam pipeline using 
`tfx_bsl`, it must be saved in the TensorFlow SavedModel format. You will need 
to download the model to your local directory to perform local predictions:
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.python.framework import tensor_util
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = tf.train.Example(...)
+
+model_path = <path to the directory where the model is stored>
+
+def extract_prediction(response):
+    # Extract the prediction from the response depending on the signature of 
the model
+
+with beam.Pipeline() as p:
+    res = (
+        p
+        | beam.Create([inputs])
+        | RunInference(
+            model_spec_pb2.InferenceSpecType(
+                saved_model_spec=model_spec_pb2.SavedModelSpec(
+                    model_path=model_path,
+                    signature_name=['serving_default'])))
+        | beam.ParDo(extract_prediction)
+    )
+```
+
+This example uses the `RunInference` transform from the `tfx_bsl` library,  
directing it to the local directory where the model is stored. The transform 
takes `tf.train.Example` type elements as inputs and produces 
`tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as 
outputs. Depending on the signature of your model, you can extract values from 
the output. For the prediction output format of exported models for each model 
type, refer to the Prediction output format section in the BigQuery 
documentation.

Review Comment:
   ```suggestion
   This example uses the `RunInference` transform from the `tfx_bsl` library, 
directing it to the local directory where the model is stored. The transform 
takes `tf.train.Example` type elements as inputs and produces 
`tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as 
outputs. Depending on the signature of your model, you can extract values from 
the output. For the prediction output format of exported models for each model 
type, refer to the 'Prediction output format' section in the BigQuery 
documentation.
   ```



##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework. The `RunInference` transform performs inference on a 
`PCollection` of examples using an ML model and outputs a `PCollection` 
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow 
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate 
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
`ModelHandler` object, which wraps the underlying model and allows you to set 
necessary environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your 
pipeline:

Review Comment:
   ```suggestion
   Here is an example of importing a PyTorch model handler for use in your 
pipeline:
   ```



##########
learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md:
##########
@@ -0,0 +1,47 @@
+Prompt:
+How can I use custom remote inference calls in my Apache Beam pipeline?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the `RunInference API` provided by the Apache Beam SDK. This 
feature allows you to seamlessly incorporate ML models into your pipeline or 
execute remote inference calls.
+
+When developing custom inference calls, consider the following factors:
+* **API quotas**: heavy loads may lead to reaching API quota limits. You can 
use `PipelineOptions` to specify the maximum number of parallel API calls. Use 
`direct_num_workers` for the Direct Runner or `max_num_workers` for the Google 
Cloud Dataflow Runner. Refer to the Beam Capability Matrix for information 
about other runners.
+* **Error handling**: it is essential to handle errors in case of API call 
failures. Consider implementing exponential backoff for retries or utilizing 
dead-letter queues for failed API calls.
+* **Monitoring**: incorporate monitoring and performance metrics to track the 
performance of your inference calls and the health of your pipeline.
+* **Batching**: batching can be used to send multiple inputs in a single API 
call for improved efficiency.
+
+To execute external API calls with the `RunInference` transform, you'll need 
to create a `beam.DoFn` in the form of a custom model handler:

Review Comment:
   ```suggestion
   To execute external API calls with the `RunInference` transform, you will 
need to create a `beam.DoFn` in the form of a custom model handler:
   ```



##########
learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md:
##########
@@ -0,0 +1,47 @@
+Prompt:
+How can I use custom remote inference calls in my Apache Beam pipeline?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the `RunInference API` provided by the Apache Beam SDK. This 
feature allows you to seamlessly incorporate ML models into your pipeline or 
execute remote inference calls.

Review Comment:
   ```suggestion
   The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the RunInference API provided by the Apache Beam Python SDK. 
This feature allows you to seamlessly incorporate ML models into your pipeline 
or execute remote inference calls.
   ```



##########
learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md:
##########
@@ -0,0 +1,38 @@
+Prompt:
+What metrics are available for monitoring the performance of ML model 
inference in Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework.
+
+The RunInference API provides built-in support for monitoring the performance 
of ML model inference. Here is a list of commonly used metrics for inference 
benchmarking:
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework. The `RunInference` transform performs inference on a 
`PCollection` of examples using an ML model and outputs a `PCollection` 
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow 
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate 
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
`ModelHandler` object, which wraps the underlying model and allows you to set 
necessary environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your 
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+  model_handler = PytorchModelHandlerTensor(
+    # Model handler setup
+  )
+
+with pipeline as p:
+    predictions = p |  'Read' >> beam.ReadFromSource('a_source')
+                    | 'RunInference' >> RunInference(model_handler)
+```
+
+For comprehensive end-to-end examples of inference with supported model 
frameworks and model hubs, refer to the Apache Beam GitHub repository.
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md:
##########
@@ -0,0 +1,51 @@
+Prompt:
+How can I integrate BigQuery ML with Apache Beam?
+
+Response:
+BigQuery ML allows you to develop and execute ML models using GoogleSQL 
queries. You can seamlessly integrate these models into your Apache Beam 
pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on 
them using the RunInference API.
+
+For an example of training a basic BigQuery ML model, refer to the BigQuery 
documentation.
+
+Once you have trained your model, you will need to export it. Here is an 
example BigQuery command to export a model to a Google Cloud Storage bucket:
+
+```
+bq extract -m <model name> gs://<cloud storage path>
+```
+
+To incorporate your BigQuery ML model into an Apache Beam pipeline using 
`tfx_bsl`, it must be saved in the TensorFlow SavedModel format. You will need 
to download the model to your local directory to perform local predictions:
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.python.framework import tensor_util
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup/42_ai_custom_inference.md:
##########
@@ -0,0 +1,47 @@
+Prompt:
+How can I use custom remote inference calls in my Apache Beam pipeline?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.
+
+When developing custom inference calls, consider the following factors:
+* **API quotas**: heavy loads may lead to reaching API quota limits. You can 
use 
[`PipelineOptions`](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options)
 to specify the maximum number of parallel API calls. Use `direct_num_workers` 
for the [Direct Runner](https://beam.apache.org/documentation/runners/direct/) 
or `max_num_workers` for the [Google Cloud Dataflow 
Runner](https://beam.apache.org/documentation/runners/dataflow/). Refer to the 
[Beam Capability 
Matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for 
information about other runners.
+* **Error handling**: it is essential to handle errors in case of API call 
failures. Consider implementing exponential backoff for retries or utilizing 
dead-letter queues for failed API calls.
+* **Monitoring**: incorporate monitoring and performance metrics to track the 
performance of your inference calls and the health of your pipeline.
+* **Batching**: batching can be used to send multiple inputs in a single API 
call for improved efficiency.
+
+To execute external API calls with the `RunInference` transform, you'll need 
to create a `beam.DoFn` in the form of a custom model handler:

Review Comment:
   ```suggestion
   To execute external API calls with the `RunInference` transform, you will 
need to create a `beam.DoFn` in the form of a custom model handler:
   ```



##########
learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the RunInference API provided by the Apache Beam SDK. This 
feature allows you to seamlessly incorporate ML models into your pipeline or 
execute remote inference calls.

Review Comment:
   ```suggestion
   The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the RunInference API provided by the Apache Beam Python SDK. 
This feature allows you to seamlessly incorporate ML models into your pipeline 
or execute remote inference calls.
   ```



##########
learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both 
batch and streaming data-parallel processing pipelines.Each Apache Beam 
pipeline is a Directed Acyclic Graph (DAG) of data processing operations 
constructed using Apache Beam SDK in one of the supported languages.
+
+To run a pipeline you need to deploy it to one of the supported Beam runners.
+
+You can use Apache Beam for various tasks within your AI/ML pipeline, 
including data validation, preprocessing, model validation, and model 
deployment and inference.
+
+Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow 
for reading and writing data from and to various data sources and sinks, as 
well as performing data validation.
+
+For data preprocessing, Apache Beam provides the `MLTransform` class. This 
feature allows you to ensure data consistency by applying the same 
preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from 
`PyTorch`, `Scikit-learn`, and `TensorFlow`.

Review Comment:
   ```suggestion
   Additionally, Apache Beam allows integration with pre-trained models from 
PyTorch, Scikit-learn, and TensorFlow.
   ```



##########
learning/prompts/documentation-lookup/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.

Review Comment:
   ```suggestion
   The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam Python SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.
   ```



##########
learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md:
##########
@@ -0,0 +1,42 @@
+Prompt:
+How can I make sure my Apache Beam pipeline uses the latest version of the ML 
model?
+
+Response:
+Production ML workflows often involve updating the ML model with new data. You 
can ensure that your Apache Beam pipeline uses the latest version of the ML 
model by leveraging the RunInference API alongside Apache Beam's side inputs 
feature. Side inputs are additional inputs that can be provided to a ‘ParDo’ 
transform, in addition to the main input `PCollection`.

Review Comment:
   ```suggestion
   Production ML workflows often involve updating the ML model with new data. 
You can ensure that your Apache Beam pipeline uses the latest version of the ML 
model by leveraging the RunInference API alongside Apache Beam's side inputs 
feature. Side inputs are additional inputs that can be provided to a `ParDo` 
transform, in addition to the main input `PCollection`.
   ```



##########
learning/prompts/documentation-lookup/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the [RunInference 
API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference)
 provided by the Apache Beam SDK. This feature allows you to seamlessly 
incorporate ML models into your pipeline or execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit 
into memory.

Review Comment:
   ```suggestion
   You can use the `RunInference` transform with large models as long as they 
fit into memory.
   ```



##########
learning/prompts/documentation-lookup/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s [Multi-language 
Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines)
 framework. The `RunInference` transform performs inference on a 
[`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections)
 of examples using an ML model and outputs a `PCollection` containing both the 
input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including [TensorFlow](https://www.tensorflow.org/), 
[PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), 
[XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), 
[TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex 
AI](https://cloud.google.com/vertex-ai), 
[TensorRT](https://developer.nvidia.com/tensorrt), and 
[ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model 
frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
[`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 object, which wraps the underlying model and allows you to set necessary 
environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your 
pipeline:

Review Comment:
   ```suggestion
   Here is an example of importing a PyTorch model handler for use in your 
pipeline:
   ```



##########
learning/prompts/documentation-lookup/39_ai_runinference_metrics.md:
##########
@@ -0,0 +1,38 @@
+Prompt:
+What metrics are available for monitoring the performance of ML model 
inference in Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s [Multi-language 
Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines)
 framework.
+
+The RunInference API provides built-in support for monitoring the performance 
of ML model inference. Here is a list of commonly used metrics for inference 
benchmarking:
+
+* `model_byte_size`: size of the memory footprint of the model load and 
initialization.
+* `load_model_latency_milli_secs`: time taken to load and initialize the model.
+* `num_inferences`: total number of elements passed to `run_inference()`.
+* `inference_batch_latency_micro_secs`: time taken to perform inference across 
all batches of examples.
+* `inference_request_batch_byte_size`: size of the batch in bytes.
+* `inference_request_batch_size`: number of examples in the batch.
+* `failed_batches_counter`: number of failed batches.
+
+You can derive additional metrics from the ones listed above. Example:
+Total time taken for inference = `num_inferences` x 
`inference_batch_latency_micro_secs_MEAN`
+
+Here is a simplified example of how to use the RunInference API to perform 
inference on a language model (LM):
+
+```python
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "Create inputs" >> beam.Create(<INPUTS>>)
+        | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>))
+        | "Inference" >> RunInference(<MODEL_HANDLER>))
+```
+After running the pipeline, you can query the metrics using the following code:

Review Comment:
   ```suggestion
   
   After running the pipeline, you can query the metrics using the following 
code:
   ```



##########
learning/prompts/documentation-lookup/41_ai_model_refresh.md:
##########
@@ -0,0 +1,42 @@
+Prompt:
+How can I make sure my Apache Beam pipeline uses the latest version of the ML 
model?
+
+Response:
+Production ML workflows often involve updating the ML model with new data. You 
can ensure that your Apache Beam pipeline uses the latest version of the ML 
model by leveraging the RunInference API alongside Apache Beam's [side 
inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) 
feature. Side inputs are additional inputs that can be provided to a 
[‘ParDo’](https://beam.apache.org/documentation/programming-guide/#pardo) 
transform, in addition to the main input 
[`PCollection`](https://beam.apache.org/documentation/basics/#pcollection).

Review Comment:
   ```suggestion
   Production ML workflows often involve updating the ML model with new data. 
You can ensure that your Apache Beam pipeline uses the latest version of the ML 
model by leveraging the RunInference API alongside Apache Beam's [side 
inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) 
feature. Side inputs are additional inputs that can be provided to a 
[`ParDo`](https://beam.apache.org/documentation/programming-guide/#pardo) 
transform, in addition to the main input 
[`PCollection`](https://beam.apache.org/documentation/basics/#pcollection).
   ```



##########
learning/prompts/documentation-lookup/40_ai_bigquery_ml.md:
##########
@@ -0,0 +1,51 @@
+Prompt:
+How can I integrate BigQuery ML with Apache Beam?
+
+Response:
+[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) allows 
you to develop and execute ML models using 
[GoogleSQL](https://cloud.google.com/bigquery/docs/introduction-sql) queries. 
You can seamlessly integrate these models into your Apache Beam pipeline using 
[TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl) 
and perform inference on them using the RunInference API.
+
+For an example of training a basic BigQuery ML model, refer to the [BigQuery 
documentation](https://cloud.google.com/bigquery/docs/bqml-introduction).
+
+Once you have trained your model, you will need to export it. Here is an 
example BigQuery command to export a model to a [Google Cloud 
Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket:
+
+```
+bq extract -m <model name> gs://<cloud storage path>
+```
+
+To incorporate your BigQuery ML model into an Apache Beam pipeline using 
`tfx_bsl`, it must be saved in the [TensorFlow 
SavedModel](https://www.tensorflow.org/guide/saved_model) format. You will need 
to download the model to your local directory to perform local predictions:
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.python.framework import tensor_util
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the RunInference API provided by the Apache Beam SDK. This 
feature allows you to seamlessly incorporate ML models into your pipeline or 
execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit 
into memory.
+
+The typical workflow for conducting inference on Large Language Models (LLMs) 
within an Apache Beam pipeline involves the following steps:
+1. Read the input text data from a source such as a file or a Pub/Sub topic.
+2. Encode the text into tokens understandable by the LLM model, typically 
using a tokenizer.
+3. Use the `RunInference` transform to obtain predictions from the model.
+4. Decode the predictions into human-readable text.
+
+Here is an example demonstrating how to leverage the RunInference API with 
LLMs in an Apache Beam pipeline:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+    )
+```
+
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible 
for encoding and decoding the input and output data using a tokenizer. The 
`model_handler` is a `ModelHandler` object that wraps the underlying model and 
allows you to configure environment variables required to load the model.
+
+Here is an example of how to configure a Pytorch model handler for use in your 
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
+model_handler = PytorchModelHandlerTensor(
+    state_dict_path=known_args.model_state_dict_path,
+    model_class=T5ForConditionalGeneration,
+    model_params={
+        "config": AutoConfig.from_pretrained(known_args.model_name)
+    },
+    device="cpu",
+    inference_fn=gen_fn)
+```
+
+Each specific model handler has its own configuration parameters. For example, 
the `PytorchModelHandlerTensor` requires the following parameters:
+* `state_dict_path`: the path to the saved model state dictionary.
+* `model_class`: the class of the Pytorch model that defines model structure.

Review Comment:
   ```suggestion
   * `model_class`: the class of the PyTorch model that defines model structure.
   ```



##########
learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md:
##########
@@ -0,0 +1,56 @@
+Prompt:
+Ho can I use Apache Beam to run inference on Large Language Models (LLMs)?
+
+Response:
+The optimal approach for conducting inference within an Apache Beam pipeline 
is by leveraging the RunInference API provided by the Apache Beam SDK. This 
feature allows you to seamlessly incorporate ML models into your pipeline or 
execute remote inference calls.
+
+You can use the `RunInference` transform with large models as long they fit 
into memory.
+
+The typical workflow for conducting inference on Large Language Models (LLMs) 
within an Apache Beam pipeline involves the following steps:
+1. Read the input text data from a source such as a file or a Pub/Sub topic.
+2. Encode the text into tokens understandable by the LLM model, typically 
using a tokenizer.
+3. Use the `RunInference` transform to obtain predictions from the model.
+4. Decode the predictions into human-readable text.
+
+Here is an example demonstrating how to leverage the RunInference API with 
LLMs in an Apache Beam pipeline:
+
+```python
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+    )
+```
+
+In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible 
for encoding and decoding the input and output data using a tokenizer. The 
`model_handler` is a `ModelHandler` object that wraps the underlying model and 
allows you to configure environment variables required to load the model.
+
+Here is an example of how to configure a Pytorch model handler for use in your 
pipeline:

Review Comment:
   ```suggestion
   Here is an example of how to configure a PyTorch model handler for use in 
your pipeline:
   ```



##########
learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md:
##########
@@ -0,0 +1,25 @@
+Prompt:
+How can I use Apache Beam in my AI/ML pipeline?
+
+Response:
+Apache Beam is an open source, unified programming model for defining both 
batch and streaming data-parallel processing pipelines.Each Apache Beam 
pipeline is a Directed Acyclic Graph (DAG) of data processing operations 
constructed using Apache Beam SDK in one of the supported languages.
+
+To run a pipeline you need to deploy it to one of the supported Beam runners.
+
+You can use Apache Beam for various tasks within your AI/ML pipeline, 
including data validation, preprocessing, model validation, and model 
deployment and inference.
+
+Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow 
for reading and writing data from and to various data sources and sinks, as 
well as performing data validation.
+
+For data preprocessing, Apache Beam provides the `MLTransform` class. This 
feature allows you to ensure data consistency by applying the same 
preprocessing steps for both training and inference.
+
+Additionally, Apache Beam allows integration with pre-trained models from 
`PyTorch`, `Scikit-learn`, and `TensorFlow`.
+
+To execute machine learning inference tasks, Apache Beam provides the 
RunInference API.
+`RunInference` is a `PTransform` that supports the usage of any custom models. 
For an example of running inference on a custom model loaded with `spaCy`, 
refer to examples in the Apache Beam GitHub repository.
+
+The `RunInference` transform efficiently handles models of any size, making it 
suitable for large language models (LLMs) and other complex architectures. You 
can find an example of deploying and performing inference on large language 
models (LLMs) in the RunInference section of the Apache Beam documentation.
+
+For more information on implementing AI/ML pipelines using Apache Beam, see 
the `Get started with AI/ML pipelines` section in the Apache Beam documentation.

Review Comment:
   ```suggestion
   For more information on implementing AI/ML pipelines using Apache Beam, see 
the 'Get started with AI/ML pipelines' section in the Apache Beam documentation.
   ```



##########
learning/prompts/documentation-lookup/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s [Multi-language 
Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines)
 framework. The `RunInference` transform performs inference on a 
[`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections)
 of examples using an ML model and outputs a `PCollection` containing both the 
input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including [TensorFlow](https://www.tensorflow.org/), 
[PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), 
[XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), 
[TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex 
AI](https://cloud.google.com/vertex-ai), 
[TensorRT](https://developer.nvidia.com/tensorrt), and 
[ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model 
frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
[`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 object, which wraps the underlying model and allows you to set necessary 
environment variables for inference.

Review Comment:
   ```suggestion
   To import a model into your Apache Beam pipeline, you will need to configure 
the 
[`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler)
 object, which wraps the underlying model and allows you to set necessary 
environment variables for inference.
   ```



##########
learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md:
##########
@@ -0,0 +1,39 @@
+Prompt:
+How can I run inference on a trained ML model using Apache Beam?
+
+Response:
+Apache Beam enables efficient inference on both local and remote ML models 
within your pipelines through the RunInference API. This functionality is 
available in the Python SDK versions 2.40.0 and later. The Java SDK versions 
2.41.0 and later also support the API through Apache Beam’s Multi-language 
Pipelines framework. The `RunInference` transform performs inference on a 
`PCollection` of examples using an ML model and outputs a `PCollection` 
containing both the input examples and the corresponding output predictions.
+
+Key features of the RunInference API include:
+* Support for both batch and streaming inference.
+* Centralized model management for efficient memory and bandwidth usage.
+* Compatibility with multiple model frameworks and model hubs.
+* Automatic model refreshing to ensure the latest model version is used.
+* GPU support for model inference.
+
+The RunInference API supports a variety of frameworks and model hubs, 
including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow 
Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate 
custom model frameworks.
+
+To import a model into your Apache Beam pipeline, you'll need to configure the 
`ModelHandler` object, which wraps the underlying model and allows you to set 
necessary environment variables for inference.
+
+Following is an example importing a `Pytorch` model handler to use in your 
pipeline:
+
+```python
+from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
+from apache_beam.ml.inference.base import RunInference
+
+  model_handler = PytorchModelHandlerTensor(
+    # Model handler setup
+  )
+
+with pipeline as p:
+    predictions = p |  'Read' >> beam.ReadFromSource('a_source')
+                    | 'RunInference' >> RunInference(model_handler)
+```
+
+For comprehensive end-to-end examples of inference with supported model 
frameworks and model hubs, refer to the Apache Beam GitHub repository.
+
+If you need to run inference on a model that isn't explicitly supported, you 
can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to 
load and use your model. For an example of running inference on a custom model 
loaded with spaCy, refer to the examples in the Apache Beam GitHub repository.
+
+For recommended patterns and best practices when leveraging Apache Beam for 
inference tasks, see the RunInference Patterns section in the official 
documentation.

Review Comment:
   ```suggestion
   For recommended patterns and best practices when leveraging Apache Beam for 
inference tasks, see the 'RunInference Patterns' section in the official 
documentation.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to