ojasiiitd opened a new issue, #33946: URL: https://github.com/apache/beam/issues/33946
### What happened? Installing Apache Beam: `! pip install apache-beam` Installing TensorRT: `! pip install --upgrade tensorrt==10.7` `! pip install --upgrade tensorrt-lean==10.7` `! pip install --upgrade tensorrt-dispatch==10.7` `! pip install pycuda` ``` ! apt-get install -y \ libnvonnxparsers8 \ libnvparsers8 \ libnvinfer-plugin8 \ libnvinfer8 \ python3-libnvinfer \ python3-libnvinfer-dev ! apt-get install -y tensorrt ``` `dpkg` output: ``` ii libnvinfer-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT development libraries ii libnvinfer-dispatch-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT development dispatch runtime libraries ii libnvinfer-dispatch10 10.7.0.23-1+cuda12.6 amd64 TensorRT dispatch runtime library ii libnvinfer-headers-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT development headers ii libnvinfer-headers-plugin-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT plugin headers ii libnvinfer-lean-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT lean runtime libraries ii libnvinfer-lean10 10.7.0.23-1+cuda12.6 amd64 TensorRT lean runtime library ii libnvinfer-plugin-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT plugin libraries ii libnvinfer-plugin10 10.7.0.23-1+cuda12.6 amd64 TensorRT plugin libraries ii libnvinfer-plugin8 8.6.1.6-1+cuda12.0 amd64 TensorRT plugin libraries ii libnvinfer-vc-plugin-dev 10.7.0.23-1+cuda12.6 amd64 TensorRT vc-plugin library ii libnvinfer-vc-plugin10 10.7.0.23-1+cuda12.6 amd64 TensorRT vc-plugin library ii libnvinfer10 10.7.0.23-1+cuda12.6 amd64 TensorRT runtime libraries ii libnvinfer8 8.6.1.6-1+cuda12.0 amd64 TensorRT runtime libraries ii python3-libnvinfer 10.7.0.23-1+cuda12.6 amd64 Python 3 bindings for TensorRT standard runtime ii python3-libnvinfer-dev 10.7.0.23-1+cuda12.6 amd64 Python 3 development package for TensorRT standard runtime ii python3-libnvinfer-dispatch 10.7.0.23-1+cuda12.6 amd64 Python 3 bindings for TensorRT dispatch runtime ii python3-libnvinfer-lean 10.7.0.23-1+cuda12.6 amd64 Python 3 bindings for TensorRT lean runtime ``` I used this [Beam example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py) as a reference. Code to reproduce: ``` import argparse import sys import logging import apache_beam as beam from apache_beam.ml.inference.base import RunInference from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy, TensorRTEngine from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import SetupOptions import tensorrt as trt class Preprocess(beam.DoFn): def __init__(self, tokenizer: tokenizer): self._tokenizer = tokenizer def process(self, element): input_ids = element return input_ids class Postprocess(beam.DoFn): def __init__(self, tokenizer): self._tokenizer = tokenizer def process(self, element): decoded_outputs = self._tokenizer.decode(element.inference, skip_special_tokens=True) print(f"Output Prediction: {decoded_outputs}") def load_engine(engine_path: str) -> trt.ICudaEngine: """Loads a serialized TensorRT engine from file.""" TRT_LOGGER = trt.Logger(trt.Logger.WARNING) # Create a TensorRT logger runtime = trt.Runtime(TRT_LOGGER) # Create a runtime object # Read the engine file with open(engine_path, "rb") as f: engine_data = f.read() # Deserialize the engine engine = runtime.deserialize_cuda_engine(engine_data) return engine def parse_args(argv): """Parses args for the workflow.""" parser = argparse.ArgumentParser() return parser.parse_known_args(args=argv) ################ MAIN ################ TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) # Change to VERBOSE runtime = trt.Runtime(TRT_LOGGER) known_args, pipeline_args = parse_args(sys.argv) pipeline_options = PipelineOptions(pipeline_args) task_inputs = [] for i,sample in enumerate(eval_dataset): task_inputs.append(sample['audio']['array']) if i == 10: break task_inputs = np.array(task_inputs) trt_tokenizer = tokenizer trt_engine_path = "./tensorrt_final_V3.engine" if not os.path.exists(trt_engine_path): print(f"Error: Engine file {engine_path} not found!") engine = load_engine(trt_engine_path) model_handler = TensorRTEngineHandlerNumPy( min_batch_size=1, max_batch_size=1, engine_path=trt_engine_path) # model_handler = TensorRTEngine(engine) # [START Pipeline] with beam.Pipeline(options=pipeline_options) as pipeline: _ = ( pipeline | "CreateInputs" >> beam.Create(task_inputs) | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=trt_tokenizer)) | "RunInference" >> RunInference(model_handler=model_handler) | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=trt_tokenizer))) # [END Pipeline] ``` I saw a [similar issue](https://github.com/NVIDIA/TensorRT/issues/4216) on the TensorRT Repository. ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [x] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [x] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Infrastructure - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org