dariabezkorovaina commented on code in PR #30351: URL: https://github.com/apache/beam/pull/30351#discussion_r1513284149
########## learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,21 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,25 @@ +Prompt: +How can I use Apache Beam in my AI/ML pipeline? + +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages. + +To run a pipeline you need to deploy it to one of the supported Beam runners. + +You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference. + +Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation. + +For data preprocessing, Apache Beam provides the `MLTransform` class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference. + +Additionally, Apache Beam allows integration with pre-trained models from `PyTorch`, `Scikit-learn`, and `TensorFlow`. + +To execute machine learning inference tasks, Apache Beam provides the RunInference API. +`RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with `spaCy`, refer to examples in the Apache Beam GitHub repository. Review Comment: ```suggestion To execute machine learning inference tasks, Apache Beam provides the RunInference API. `RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with spaCy, refer to the Apache Beam GitHub repository. ``` ########## learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,25 @@ +Prompt: +How can I use Apache Beam in my AI/ML pipeline? + +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages. + +To run a pipeline you need to deploy it to one of the supported Beam runners. Review Comment: ```suggestion Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the supported languages. To execute a pipeline, you need to deploy it to one of the supported Beam runners. ``` ########## learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. Review Comment: ```suggestion To import a model into your Apache Beam pipeline, you will need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. ``` ########## learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md: ########## @@ -0,0 +1,38 @@ +Prompt: +What metrics are available for monitoring the performance of ML model inference in Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. + +The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking: + +* `model_byte_size`: size of the memory footprint of the model load and initialization. +* `load_model_latency_milli_secs`: time taken to load and initialize the model. +* `num_inferences`: total number of elements passed to `run_inference()`. +* `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples. +* `inference_request_batch_byte_size`: size of the batch in bytes. +* `inference_request_batch_size`: number of examples in the batch. +* `failed_batches_counter`: number of failed batches. + +You can derive additional metrics from the ones listed above. Example: +Total time taken for inference = `num_inferences` x `inference_batch_latency_micro_secs_MEAN` + +Here is a simplified example of how to use the RunInference API to perform inference on a language model (LM): + +```python + with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | "Create inputs" >> beam.Create(<INPUTS>>) + | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>)) + | "Inference" >> RunInference(<MODEL_HANDLER>)) +``` +After running the pipeline, you can query the metrics using the following code: Review Comment: ```suggestion After running the pipeline, you can query the metrics using the following code: ``` ########## learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +You can use the `RunInference` transform with large models as long they fit into memory. Review Comment: ```suggestion You can use the `RunInference` transform with large models as long as they fit into memory. ``` ########## learning/prompts/documentation-lookup/42_ai_custom_inference.md: ########## @@ -0,0 +1,47 @@ +Prompt: +How can I use custom remote inference calls in my Apache Beam pipeline? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. Review Comment: ```suggestion The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. ``` ########## learning/prompts/documentation-lookup/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +You can use the `RunInference` transform with large models as long they fit into memory. + +The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps: +1. Read the input text data from a source such as a file or a Pub/Sub topic. +2. Encode the text into tokens understandable by the LLM model, typically using a tokenizer. +3. Use the `RunInference` transform to obtain predictions from the model. +4. Decode the predictions into human-readable text. + +Here is an example demonstrating how to leverage the RunInference API with LLMs in an Apache Beam pipeline: + +```python +import apache_beam as beam +from apache_beam.ml.inference.base import RunInference + +with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | "CreateInputs" >> beam.Create(task_sentences) + | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer)) + | "RunInference" >> RunInference(model_handler=model_handler) + | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)) + ) +``` + +In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model. + +Here is an example of how to configure a Pytorch model handler for use in your pipeline: Review Comment: ```suggestion Here is an example of how to configure a PyTorch model handler for use in your pipeline: ``` ########## learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,25 @@ +Prompt: +How can I use Apache Beam in my AI/ML pipeline? + +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages. + +To run a pipeline you need to deploy it to one of the supported Beam runners. + +You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference. + +Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation. + +For data preprocessing, Apache Beam provides the `MLTransform` class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference. + +Additionally, Apache Beam allows integration with pre-trained models from `PyTorch`, `Scikit-learn`, and `TensorFlow`. + +To execute machine learning inference tasks, Apache Beam provides the RunInference API. +`RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with `spaCy`, refer to examples in the Apache Beam GitHub repository. + +The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the RunInference section of the Apache Beam documentation. Review Comment: ```suggestion The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the 'RunInference' section of the Apache Beam documentation. ``` ########## learning/prompts/documentation-lookup/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +You can use the `RunInference` transform with large models as long they fit into memory. + +The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps: +1. Read the input text data from a source such as a file or a Pub/Sub topic. +2. Encode the text into tokens understandable by the LLM model, typically using a tokenizer. +3. Use the `RunInference` transform to obtain predictions from the model. +4. Decode the predictions into human-readable text. + +Here is an example demonstrating how to leverage the RunInference API with LLMs in an Apache Beam pipeline: + +```python +import apache_beam as beam +from apache_beam.ml.inference.base import RunInference + +with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | "CreateInputs" >> beam.Create(task_sentences) + | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer)) + | "RunInference" >> RunInference(model_handler=model_handler) + | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)) + ) +``` + +In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model. + +Here is an example of how to configure a Pytorch model handler for use in your pipeline: + +```python +from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor +model_handler = PytorchModelHandlerTensor( + state_dict_path=known_args.model_state_dict_path, + model_class=T5ForConditionalGeneration, + model_params={ + "config": AutoConfig.from_pretrained(known_args.model_name) + }, + device="cpu", + inference_fn=gen_fn) +``` + +Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters: +* `state_dict_path`: the path to the saved model state dictionary. +* `model_class`: the class of the Pytorch model that defines model structure. Review Comment: ```suggestion * `model_class`: the class of the PyTorch model that defines model structure. ``` ########## learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. + +Following is an example importing a `Pytorch` model handler to use in your pipeline: + +```python +from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor +from apache_beam.ml.inference.base import RunInference + + model_handler = PytorchModelHandlerTensor( + # Model handler setup + ) + +with pipeline as p: + predictions = p | 'Read' >> beam.ReadFromSource('a_source') + | 'RunInference' >> RunInference(model_handler) +``` + +For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the Apache Beam GitHub repository. + Review Comment: Combined all references to the GitHub repository into the final paragraph ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,24 @@ +Prompt: +How can I use Apache Beam in my AI/ML pipeline? + +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). Review Comment: ```suggestion Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). To execute a pipeline, you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). ``` ########## learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md: ########## @@ -0,0 +1,42 @@ +Prompt: +How can I make sure my Apache Beam pipeline uses the latest version of the ML model? + +Response: +Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's side inputs feature. Side inputs are additional inputs that can be provided to a ‘ParDo’ transform, in addition to the main input `PCollection`. + +The `RunInference` transform accepts an optional input parameter `model_metadata_pcoll`, which is a side input `PCollection` emitting `ModelMetadata`. `ModelMetadata` is a `NamedTuple` object containing `model_id` and `model_name`, used to load the model for inference and identify it in the metrics generated by the `RunInference` transform. The URL or path to the model should be compatible with the respective `ModelHandler` requirements. + +If the main collection emits inputs before the `model_metadata_pcoll` side input is available, the main `PCollection` will be buffered until the `model_metadata_pcoll` side input is emitted. + +For more information on `ModelMetadata`, refer to the Apache Beam Python SDK documentation. + +A common approach to model updates in production is to use a `FileWatchPattern` as a side input: + +```python +import apache_beam as beam +from apache_beam.ml.inference.utils import WatchFilePattern +from apache_beam.ml.inference.base import RunInference + +tf_model_handler = ... # model handler for the model + +with beam.Pipeline() as pipeline: + + file_pattern = '<path_to_model_file>' + + side_input_pcoll = ( + pipeline + | "FilePatternUpdates" >> WatchFilePattern(file_pattern=file_pattern)) + + main_input_pcoll = ... # main input PCollection + + inference_pcoll = ( + main_input_pcoll + | "RunInference" >> RunInference( + model_handler=model_handler, + model_metadata_pcoll=side_input_pcoll)) +``` + +In the provided example, the `model_metadata_pcoll` parameter expects a `PCollection` of `ModelMetadata` compatible with the `AsSingleton` marker. Given that the pipeline employs the `WatchFilePattern` class as a side input, it automatically manages windowing and encapsulates the output into `ModelMetadata`. + +For more information, refer to the `WatchFilePattern` section in the Apache Beam documentation. Review Comment: ```suggestion For more information, refer to the section 'Use `WatchFilePattern` to auto-update ML models in RunInference' in the Apache Beam documentation. ``` ########## learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,21 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,21 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? + + +Response: +You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference. + +However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + +TFX is an end-to-end platform designed for deploying production ML pipelines. TFX pipelines consist of a sequence of components that together form an ML pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Apache Airflow](https://airflow.apache.org/), and Apache Beam itself. + +To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps: +1. Create TFX components by specifying interfaces and containerizing the component implementation. +2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them. Review Comment: ```suggestion 2. Create the TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them. ``` ########## learning/prompts/documentation-lookup-nolinks/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,21 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? + + +Response: +You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference. + +However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include Kubeflow, TFX, and Apache Airflow. + +TFX is an end-to-end platform designed for deploying production ML pipelines. TFX pipelines consist of a sequence of components that together form an ML pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively supported orchestrators include Kubeflow Pipelines, Apache Airflow, and Apache Beam itself. + +To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps: +1. Create TFX components by specifying interfaces and containerizing the component implementation. +2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them. Review Comment: ```suggestion 2. Create the TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them. ``` ########## learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. + +Following is an example importing a `Pytorch` model handler to use in your pipeline: + +```python +from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor +from apache_beam.ml.inference.base import RunInference + + model_handler = PytorchModelHandlerTensor( + # Model handler setup + ) + +with pipeline as p: + predictions = p | 'Read' >> beam.ReadFromSource('a_source') + | 'RunInference' >> RunInference(model_handler) +``` + +For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the Apache Beam GitHub repository. + +If you need to run inference on a model that isn't explicitly supported, you can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to load and use your model. For an example of running inference on a custom model loaded with spaCy, refer to the examples in the Apache Beam GitHub repository. + +For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the RunInference Patterns section in the official documentation. + +For an example of using the RunInference API in the Java SDK, see the example multi-language pipelines in the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the Apache Beam official documentation. Review Comment: ```suggestion For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, including examples of using the RunInference API in the Java SDK, see the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the Apache Beam official documentation. ``` ########## learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md: ########## @@ -0,0 +1,51 @@ +Prompt: +How can I integrate BigQuery ML with Apache Beam? + +Response: +BigQuery ML allows you to develop and execute ML models using GoogleSQL queries. You can seamlessly integrate these models into your Apache Beam pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on them using the RunInference API. + +For an example of training a basic BigQuery ML model, refer to the BigQuery documentation. + +Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a Google Cloud Storage bucket: + +``` +bq extract -m <model name> gs://<cloud storage path> +``` + +To incorporate your BigQuery ML model into an Apache Beam pipeline using `tfx_bsl`, it must be saved in the TensorFlow SavedModel format. You will need to download the model to your local directory to perform local predictions: + +```python +import apache_beam +import tensorflow as tf +from google.protobuf import text_format +from tensorflow.python.framework import tensor_util +from tfx_bsl.beam import run_inference +from tfx_bsl.public.beam import RunInference +from tfx_bsl.public.proto import model_spec_pb2 + + +inputs = tf.train.Example(...) + +model_path = <path to the directory where the model is stored> + +def extract_prediction(response): + # Extract the prediction from the response depending on the signature of the model + +with beam.Pipeline() as p: + res = ( + p + | beam.Create([inputs]) + | RunInference( + model_spec_pb2.InferenceSpecType( + saved_model_spec=model_spec_pb2.SavedModelSpec( + model_path=model_path, + signature_name=['serving_default']))) + | beam.ParDo(extract_prediction) + ) +``` + +This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the Prediction output format section in the BigQuery documentation. Review Comment: ```suggestion This example uses the `RunInference` transform from the `tfx_bsl` library, directing it to the local directory where the model is stored. The transform takes `tf.train.Example` type elements as inputs and produces `tensorflow_serving.apis.prediction_log_pb2.PredictionLog` type elements as outputs. Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the 'Prediction output format' section in the BigQuery documentation. ``` ########## learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. + +Following is an example importing a `Pytorch` model handler to use in your pipeline: Review Comment: ```suggestion Here is an example of importing a PyTorch model handler for use in your pipeline: ``` ########## learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md: ########## @@ -0,0 +1,47 @@ +Prompt: +How can I use custom remote inference calls in my Apache Beam pipeline? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the `RunInference API` provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +When developing custom inference calls, consider the following factors: +* **API quotas**: heavy loads may lead to reaching API quota limits. You can use `PipelineOptions` to specify the maximum number of parallel API calls. Use `direct_num_workers` for the Direct Runner or `max_num_workers` for the Google Cloud Dataflow Runner. Refer to the Beam Capability Matrix for information about other runners. +* **Error handling**: it is essential to handle errors in case of API call failures. Consider implementing exponential backoff for retries or utilizing dead-letter queues for failed API calls. +* **Monitoring**: incorporate monitoring and performance metrics to track the performance of your inference calls and the health of your pipeline. +* **Batching**: batching can be used to send multiple inputs in a single API call for improved efficiency. + +To execute external API calls with the `RunInference` transform, you'll need to create a `beam.DoFn` in the form of a custom model handler: Review Comment: ```suggestion To execute external API calls with the `RunInference` transform, you will need to create a `beam.DoFn` in the form of a custom model handler: ``` ########## learning/prompts/documentation-lookup-nolinks/42_ai_custom_inference.md: ########## @@ -0,0 +1,47 @@ +Prompt: +How can I use custom remote inference calls in my Apache Beam pipeline? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the `RunInference API` provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. Review Comment: ```suggestion The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. ``` ########## learning/prompts/documentation-lookup-nolinks/39_ai_runinference_metrics.md: ########## @@ -0,0 +1,38 @@ +Prompt: +What metrics are available for monitoring the performance of ML model inference in Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. + +The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking: + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. + +Following is an example importing a `Pytorch` model handler to use in your pipeline: + +```python +from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor +from apache_beam.ml.inference.base import RunInference + + model_handler = PytorchModelHandlerTensor( + # Model handler setup + ) + +with pipeline as p: + predictions = p | 'Read' >> beam.ReadFromSource('a_source') + | 'RunInference' >> RunInference(model_handler) +``` + +For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the Apache Beam GitHub repository. + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup-nolinks/40_ai_bigquery_ml.md: ########## @@ -0,0 +1,51 @@ +Prompt: +How can I integrate BigQuery ML with Apache Beam? + +Response: +BigQuery ML allows you to develop and execute ML models using GoogleSQL queries. You can seamlessly integrate these models into your Apache Beam pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on them using the RunInference API. + +For an example of training a basic BigQuery ML model, refer to the BigQuery documentation. + +Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a Google Cloud Storage bucket: + +``` +bq extract -m <model name> gs://<cloud storage path> +``` + +To incorporate your BigQuery ML model into an Apache Beam pipeline using `tfx_bsl`, it must be saved in the TensorFlow SavedModel format. You will need to download the model to your local directory to perform local predictions: + +```python +import apache_beam +import tensorflow as tf +from google.protobuf import text_format +from tensorflow.python.framework import tensor_util +from tfx_bsl.beam import run_inference +from tfx_bsl.public.beam import RunInference +from tfx_bsl.public.proto import model_spec_pb2 + + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup/42_ai_custom_inference.md: ########## @@ -0,0 +1,47 @@ +Prompt: +How can I use custom remote inference calls in my Apache Beam pipeline? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +When developing custom inference calls, consider the following factors: +* **API quotas**: heavy loads may lead to reaching API quota limits. You can use [`PipelineOptions`](https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options) to specify the maximum number of parallel API calls. Use `direct_num_workers` for the [Direct Runner](https://beam.apache.org/documentation/runners/direct/) or `max_num_workers` for the [Google Cloud Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/). Refer to the [Beam Capability Matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for information about other runners. +* **Error handling**: it is essential to handle errors in case of API call failures. Consider implementing exponential backoff for retries or utilizing dead-letter queues for failed API calls. +* **Monitoring**: incorporate monitoring and performance metrics to track the performance of your inference calls and the health of your pipeline. +* **Batching**: batching can be used to send multiple inputs in a single API call for improved efficiency. + +To execute external API calls with the `RunInference` transform, you'll need to create a `beam.DoFn` in the form of a custom model handler: Review Comment: ```suggestion To execute external API calls with the `RunInference` transform, you will need to create a `beam.DoFn` in the form of a custom model handler: ``` ########## learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. Review Comment: ```suggestion The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. ``` ########## learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,25 @@ +Prompt: +How can I use Apache Beam in my AI/ML pipeline? + +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages. + +To run a pipeline you need to deploy it to one of the supported Beam runners. + +You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference. + +Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation. + +For data preprocessing, Apache Beam provides the `MLTransform` class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference. + +Additionally, Apache Beam allows integration with pre-trained models from `PyTorch`, `Scikit-learn`, and `TensorFlow`. Review Comment: ```suggestion Additionally, Apache Beam allows integration with pre-trained models from PyTorch, Scikit-learn, and TensorFlow. ``` ########## learning/prompts/documentation-lookup/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. Review Comment: ```suggestion The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam Python SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. ``` ########## learning/prompts/documentation-lookup-nolinks/41_ai_model_refresh.md: ########## @@ -0,0 +1,42 @@ +Prompt: +How can I make sure my Apache Beam pipeline uses the latest version of the ML model? + +Response: +Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's side inputs feature. Side inputs are additional inputs that can be provided to a ‘ParDo’ transform, in addition to the main input `PCollection`. Review Comment: ```suggestion Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's side inputs feature. Side inputs are additional inputs that can be provided to a `ParDo` transform, in addition to the main input `PCollection`. ``` ########## learning/prompts/documentation-lookup/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the [RunInference API](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +You can use the `RunInference` transform with large models as long they fit into memory. Review Comment: ```suggestion You can use the `RunInference` transform with large models as long as they fit into memory. ``` ########## learning/prompts/documentation-lookup/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework. The `RunInference` transform performs inference on a [`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections) of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference. + +Following is an example importing a `Pytorch` model handler to use in your pipeline: Review Comment: ```suggestion Here is an example of importing a PyTorch model handler for use in your pipeline: ``` ########## learning/prompts/documentation-lookup/39_ai_runinference_metrics.md: ########## @@ -0,0 +1,38 @@ +Prompt: +What metrics are available for monitoring the performance of ML model inference in Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework. + +The RunInference API provides built-in support for monitoring the performance of ML model inference. Here is a list of commonly used metrics for inference benchmarking: + +* `model_byte_size`: size of the memory footprint of the model load and initialization. +* `load_model_latency_milli_secs`: time taken to load and initialize the model. +* `num_inferences`: total number of elements passed to `run_inference()`. +* `inference_batch_latency_micro_secs`: time taken to perform inference across all batches of examples. +* `inference_request_batch_byte_size`: size of the batch in bytes. +* `inference_request_batch_size`: number of examples in the batch. +* `failed_batches_counter`: number of failed batches. + +You can derive additional metrics from the ones listed above. Example: +Total time taken for inference = `num_inferences` x `inference_batch_latency_micro_secs_MEAN` + +Here is a simplified example of how to use the RunInference API to perform inference on a language model (LM): + +```python + with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | "Create inputs" >> beam.Create(<INPUTS>>) + | "Tokenize" >> beam.ParDo(Tokenize(<TOKENIZER_NAME>)) + | "Inference" >> RunInference(<MODEL_HANDLER>)) +``` +After running the pipeline, you can query the metrics using the following code: Review Comment: ```suggestion After running the pipeline, you can query the metrics using the following code: ``` ########## learning/prompts/documentation-lookup/41_ai_model_refresh.md: ########## @@ -0,0 +1,42 @@ +Prompt: +How can I make sure my Apache Beam pipeline uses the latest version of the ML model? + +Response: +Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's [side inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) feature. Side inputs are additional inputs that can be provided to a [‘ParDo’](https://beam.apache.org/documentation/programming-guide/#pardo) transform, in addition to the main input [`PCollection`](https://beam.apache.org/documentation/basics/#pcollection). Review Comment: ```suggestion Production ML workflows often involve updating the ML model with new data. You can ensure that your Apache Beam pipeline uses the latest version of the ML model by leveraging the RunInference API alongside Apache Beam's [side inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) feature. Side inputs are additional inputs that can be provided to a [`ParDo`](https://beam.apache.org/documentation/programming-guide/#pardo) transform, in addition to the main input [`PCollection`](https://beam.apache.org/documentation/basics/#pcollection). ``` ########## learning/prompts/documentation-lookup/40_ai_bigquery_ml.md: ########## @@ -0,0 +1,51 @@ +Prompt: +How can I integrate BigQuery ML with Apache Beam? + +Response: +[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction) allows you to develop and execute ML models using [GoogleSQL](https://cloud.google.com/bigquery/docs/introduction-sql) queries. You can seamlessly integrate these models into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl) and perform inference on them using the RunInference API. + +For an example of training a basic BigQuery ML model, refer to the [BigQuery documentation](https://cloud.google.com/bigquery/docs/bqml-introduction). + +Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a [Google Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket: + +``` +bq extract -m <model name> gs://<cloud storage path> +``` + +To incorporate your BigQuery ML model into an Apache Beam pipeline using `tfx_bsl`, it must be saved in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. You will need to download the model to your local directory to perform local predictions: + +```python +import apache_beam +import tensorflow as tf +from google.protobuf import text_format +from tensorflow.python.framework import tensor_util +from tfx_bsl.beam import run_inference +from tfx_bsl.public.beam import RunInference +from tfx_bsl.public.proto import model_spec_pb2 + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +You can use the `RunInference` transform with large models as long they fit into memory. + +The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps: +1. Read the input text data from a source such as a file or a Pub/Sub topic. +2. Encode the text into tokens understandable by the LLM model, typically using a tokenizer. +3. Use the `RunInference` transform to obtain predictions from the model. +4. Decode the predictions into human-readable text. + +Here is an example demonstrating how to leverage the RunInference API with LLMs in an Apache Beam pipeline: + +```python +import apache_beam as beam +from apache_beam.ml.inference.base import RunInference + +with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | "CreateInputs" >> beam.Create(task_sentences) + | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer)) + | "RunInference" >> RunInference(model_handler=model_handler) + | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)) + ) +``` + +In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model. + +Here is an example of how to configure a Pytorch model handler for use in your pipeline: + +```python +from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor +model_handler = PytorchModelHandlerTensor( + state_dict_path=known_args.model_state_dict_path, + model_class=T5ForConditionalGeneration, + model_params={ + "config": AutoConfig.from_pretrained(known_args.model_name) + }, + device="cpu", + inference_fn=gen_fn) +``` + +Each specific model handler has its own configuration parameters. For example, the `PytorchModelHandlerTensor` requires the following parameters: +* `state_dict_path`: the path to the saved model state dictionary. +* `model_class`: the class of the Pytorch model that defines model structure. Review Comment: ```suggestion * `model_class`: the class of the PyTorch model that defines model structure. ``` ########## learning/prompts/documentation-lookup-nolinks/43_ai_llm_inference.md: ########## @@ -0,0 +1,56 @@ +Prompt: +Ho can I use Apache Beam to run inference on Large Language Models (LLMs)? + +Response: +The optimal approach for conducting inference within an Apache Beam pipeline is by leveraging the RunInference API provided by the Apache Beam SDK. This feature allows you to seamlessly incorporate ML models into your pipeline or execute remote inference calls. + +You can use the `RunInference` transform with large models as long they fit into memory. + +The typical workflow for conducting inference on Large Language Models (LLMs) within an Apache Beam pipeline involves the following steps: +1. Read the input text data from a source such as a file or a Pub/Sub topic. +2. Encode the text into tokens understandable by the LLM model, typically using a tokenizer. +3. Use the `RunInference` transform to obtain predictions from the model. +4. Decode the predictions into human-readable text. + +Here is an example demonstrating how to leverage the RunInference API with LLMs in an Apache Beam pipeline: + +```python +import apache_beam as beam +from apache_beam.ml.inference.base import RunInference + +with beam.Pipeline(options=pipeline_options) as pipeline: + _ = ( + pipeline + | "CreateInputs" >> beam.Create(task_sentences) + | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer)) + | "RunInference" >> RunInference(model_handler=model_handler) + | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)) + ) +``` + +In this example, `Preprocess` and `Postprocess` are `DoFn` classes responsible for encoding and decoding the input and output data using a tokenizer. The `model_handler` is a `ModelHandler` object that wraps the underlying model and allows you to configure environment variables required to load the model. + +Here is an example of how to configure a Pytorch model handler for use in your pipeline: Review Comment: ```suggestion Here is an example of how to configure a PyTorch model handler for use in your pipeline: ``` ########## learning/prompts/documentation-lookup-nolinks/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,25 @@ +Prompt: +How can I use Apache Beam in my AI/ML pipeline? + +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the supported languages. + +To run a pipeline you need to deploy it to one of the supported Beam runners. + +You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference. + +Apache Beam offers a rich set of I/O connectors and `PTransforms` that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation. + +For data preprocessing, Apache Beam provides the `MLTransform` class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference. + +Additionally, Apache Beam allows integration with pre-trained models from `PyTorch`, `Scikit-learn`, and `TensorFlow`. + +To execute machine learning inference tasks, Apache Beam provides the RunInference API. +`RunInference` is a `PTransform` that supports the usage of any custom models. For an example of running inference on a custom model loaded with `spaCy`, refer to examples in the Apache Beam GitHub repository. + +The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the RunInference section of the Apache Beam documentation. + +For more information on implementing AI/ML pipelines using Apache Beam, see the `Get started with AI/ML pipelines` section in the Apache Beam documentation. Review Comment: ```suggestion For more information on implementing AI/ML pipelines using Apache Beam, see the 'Get started with AI/ML pipelines' section in the Apache Beam documentation. ``` ########## learning/prompts/documentation-lookup/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework. The `RunInference` transform performs inference on a [`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections) of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference. Review Comment: ```suggestion To import a model into your Apache Beam pipeline, you will need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference. ``` ########## learning/prompts/documentation-lookup-nolinks/38_ai_runinference.md: ########## @@ -0,0 +1,39 @@ +Prompt: +How can I run inference on a trained ML model using Apache Beam? + +Response: +Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s Multi-language Pipelines framework. The `RunInference` transform performs inference on a `PCollection` of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions. + +Key features of the RunInference API include: +* Support for both batch and streaming inference. +* Centralized model management for efficient memory and bandwidth usage. +* Compatibility with multiple model frameworks and model hubs. +* Automatic model refreshing to ensure the latest model version is used. +* GPU support for model inference. + +The RunInference API supports a variety of frameworks and model hubs, including TensorFlow, PyTorch, Scikit-learn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX. Additionally, you can easily integrate custom model frameworks. + +To import a model into your Apache Beam pipeline, you'll need to configure the `ModelHandler` object, which wraps the underlying model and allows you to set necessary environment variables for inference. + +Following is an example importing a `Pytorch` model handler to use in your pipeline: + +```python +from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor +from apache_beam.ml.inference.base import RunInference + + model_handler = PytorchModelHandlerTensor( + # Model handler setup + ) + +with pipeline as p: + predictions = p | 'Read' >> beam.ReadFromSource('a_source') + | 'RunInference' >> RunInference(model_handler) +``` + +For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the Apache Beam GitHub repository. + +If you need to run inference on a model that isn't explicitly supported, you can create your own `ModelHandler` or `KeyedModelHandler` with custom logic to load and use your model. For an example of running inference on a custom model loaded with spaCy, refer to the examples in the Apache Beam GitHub repository. + +For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the RunInference Patterns section in the official documentation. Review Comment: ```suggestion For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the 'RunInference Patterns' section in the official documentation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
