ojasiiitd opened a new issue, #33946:
URL: https://github.com/apache/beam/issues/33946

   ### What happened?
   
   Installing Apache Beam:
   `! pip install apache-beam`
   
   Installing TensorRT:
   `! pip install --upgrade tensorrt==10.7`
   `! pip install --upgrade tensorrt-lean==10.7`
   `! pip install --upgrade tensorrt-dispatch==10.7`
   `! pip install pycuda`
   ```
   ! apt-get install -y \
       libnvonnxparsers8 \
       libnvparsers8 \
       libnvinfer-plugin8 \
       libnvinfer8 \
       python3-libnvinfer \
       python3-libnvinfer-dev
   ! apt-get install -y tensorrt
   ```
   
   `dpkg` output:
   ```
   ii  libnvinfer-dev                         10.7.0.23-1+cuda12.6              
      amd64        TensorRT development libraries
   ii  libnvinfer-dispatch-dev                10.7.0.23-1+cuda12.6              
      amd64        TensorRT development dispatch runtime libraries
   ii  libnvinfer-dispatch10                  10.7.0.23-1+cuda12.6              
      amd64        TensorRT dispatch runtime library
   ii  libnvinfer-headers-dev                 10.7.0.23-1+cuda12.6              
      amd64        TensorRT development headers
   ii  libnvinfer-headers-plugin-dev          10.7.0.23-1+cuda12.6              
      amd64        TensorRT plugin headers
   ii  libnvinfer-lean-dev                    10.7.0.23-1+cuda12.6              
      amd64        TensorRT lean runtime libraries
   ii  libnvinfer-lean10                      10.7.0.23-1+cuda12.6              
      amd64        TensorRT lean runtime library
   ii  libnvinfer-plugin-dev                  10.7.0.23-1+cuda12.6              
      amd64        TensorRT plugin libraries
   ii  libnvinfer-plugin10                    10.7.0.23-1+cuda12.6              
      amd64        TensorRT plugin libraries
   ii  libnvinfer-plugin8                     8.6.1.6-1+cuda12.0                
      amd64        TensorRT plugin libraries
   ii  libnvinfer-vc-plugin-dev               10.7.0.23-1+cuda12.6              
      amd64        TensorRT vc-plugin library
   ii  libnvinfer-vc-plugin10                 10.7.0.23-1+cuda12.6              
      amd64        TensorRT vc-plugin library
   ii  libnvinfer10                           10.7.0.23-1+cuda12.6              
      amd64        TensorRT runtime libraries
   ii  libnvinfer8                            8.6.1.6-1+cuda12.0                
      amd64        TensorRT runtime libraries
   ii  python3-libnvinfer                     10.7.0.23-1+cuda12.6              
      amd64        Python 3 bindings for TensorRT standard runtime
   ii  python3-libnvinfer-dev                 10.7.0.23-1+cuda12.6              
      amd64        Python 3 development package for TensorRT standard runtime
   ii  python3-libnvinfer-dispatch            10.7.0.23-1+cuda12.6              
      amd64        Python 3 bindings for TensorRT dispatch runtime
   ii  python3-libnvinfer-lean                10.7.0.23-1+cuda12.6              
      amd64        Python 3 bindings for TensorRT lean runtime
   ```
   
   I used this [Beam 
example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
 as a reference. Code to reproduce:
   
   ```
   import argparse
   import sys
   import logging
   
   import apache_beam as beam
   from apache_beam.ml.inference.base import RunInference
   from apache_beam.ml.inference.tensorrt_inference import 
TensorRTEngineHandlerNumPy, TensorRTEngine
   from apache_beam.ml.inference.pytorch_inference import 
PytorchModelHandlerTensor
   from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn
   from apache_beam.options.pipeline_options import PipelineOptions
   from apache_beam.options.pipeline_options import SetupOptions
   import tensorrt as trt
   
   class Preprocess(beam.DoFn):
     def __init__(self, tokenizer: tokenizer):
       self._tokenizer = tokenizer
   
     def process(self, element):
       input_ids = element
       return input_ids
   
   
   class Postprocess(beam.DoFn):
     def __init__(self, tokenizer):
       self._tokenizer = tokenizer
   
     def process(self, element):
       decoded_outputs = self._tokenizer.decode(element.inference, 
skip_special_tokens=True)
   
       print(f"Output Prediction: {decoded_outputs}")
   
   
   def load_engine(engine_path: str) -> trt.ICudaEngine:
       """Loads a serialized TensorRT engine from file."""
       TRT_LOGGER = trt.Logger(trt.Logger.WARNING)  # Create a TensorRT logger
       runtime = trt.Runtime(TRT_LOGGER)  # Create a runtime object
       
       # Read the engine file
       with open(engine_path, "rb") as f:
           engine_data = f.read()
       
       # Deserialize the engine
       engine = runtime.deserialize_cuda_engine(engine_data)
       
       return engine
   
   def parse_args(argv):
     """Parses args for the workflow."""
     parser = argparse.ArgumentParser()
     return parser.parse_known_args(args=argv)
   
   
   ################ MAIN ################
   TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)  # Change to VERBOSE
   runtime = trt.Runtime(TRT_LOGGER)
   
   known_args, pipeline_args = parse_args(sys.argv)
   pipeline_options = PipelineOptions(pipeline_args)
   
   task_inputs = []
   for i,sample in enumerate(eval_dataset):
     task_inputs.append(sample['audio']['array'])
     if i == 10:
       break
   task_inputs = np.array(task_inputs)
   
   trt_tokenizer = tokenizer
   
   trt_engine_path = "./tensorrt_final_V3.engine"
   if not os.path.exists(trt_engine_path):
       print(f"Error: Engine file {engine_path} not found!")
   engine = load_engine(trt_engine_path)
   
   model_handler = TensorRTEngineHandlerNumPy(
         min_batch_size=1,
         max_batch_size=1,
         engine_path=trt_engine_path)
   # model_handler = TensorRTEngine(engine)
   
   # [START Pipeline]
   with beam.Pipeline(options=pipeline_options) as pipeline:
     _ = (
         pipeline
         | "CreateInputs" >> beam.Create(task_inputs)
         | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=trt_tokenizer))
         | "RunInference" >> RunInference(model_handler=model_handler)
         | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=trt_tokenizer)))
   # [END Pipeline]
   ```
   
   I saw a [similar issue](https://github.com/NVIDIA/TensorRT/issues/4216) on 
the TensorRT Repository.
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [x] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to