bgeng777 opened a new issue, #36587:
URL: https://github.com/apache/beam/issues/36587

   ### What happened?
   
   apache_beam.ml.inference.huggingface_inference._convert_to_result
   
   
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/huggingface_inference.py#L572
   
   <img width="1600" height="420" alt="Image" 
src="https://github.com/user-attachments/assets/5766a4af-09df-4633-97b1-a9ec9474cb89";
 />
   
   Codes here is `PredictionResult(x, y, model_id) for x, y in zip(batch, 
[predictions])`
   This will make the batch only output one row of result. This works if batch 
size is 1, but batch size can be much larger.
   This can be verify using the ipynb:
   
https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_huggingface.ipynb
   For codes 
   ```python
   from typing import Dict
   from typing import Iterable
   from typing import Tuple
   
   import tensorflow as tf
   import torch
   from transformers import AutoTokenizer
   from transformers import TFAutoModelForMaskedLM
   
   import apache_beam as beam
   from apache_beam.ml.inference.base import KeyedModelHandler
   from apache_beam.ml.inference.base import PredictionResult
   from apache_beam.ml.inference.base import RunInference
   from apache_beam.ml.inference.huggingface_inference import 
HuggingFacePipelineModelHandler
   from apache_beam.ml.inference.huggingface_inference import 
HuggingFaceModelHandlerKeyedTensor
   from apache_beam.ml.inference.huggingface_inference import 
HuggingFaceModelHandlerTensor
   from apache_beam.ml.inference.huggingface_inference import PipelineTask
   
   model_handler = HuggingFacePipelineModelHandler(
       task=PipelineTask.Translation_XX_to_YY,
       model = "google/flan-t5-small",
       load_pipeline_args={'framework': 'pt'},
       inference_args={'max_length': 200},
       min_batch_size=2
   )
   
   text = ["translate English to Spanish: How are you doing?",
           "translate English to English: This is the Apache Beam project."]
   
   class FormatOutput(beam.DoFn):
     """
     Extract the results from PredictionResult and print the results.
     """
     def process(self, element):
       example = element.example
       translated_text = element.inference[0]['translation_text']
       print(f'Example: {example}')
       print(f'Translated text: {translated_text}')
       print('-' * 80)
   
   
   with beam.Pipeline() as beam_pipeline:
     examples = (
         beam_pipeline
         | "CreateExamples" >> beam.Create(text)
     )
     inferences = (
         examples
         | "RunInference" >> RunInference(model_handler)
         | "Print" >> beam.ParDo(FormatOutput())
     )
   
   ```
   The output is:
   ```txt
   WARNING:apache_beam.transforms.core:('No iterator is returned by the process 
method in %s.', <class '__main__.FormatOutput'>)
   
/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py:1601:
 FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to 
`True` by default. This behavior will be depracted in transformers v4.45, and 
will be then set to `False` by default. For more details check this issue: 
https://github.com/huggingface/transformers/issues/31884
     warnings.warn(
   
/usr/local/lib/python3.12/dist-packages/transformers/generation/utils.py:1258: 
UserWarning: Using the model-agnostic default `max_length` (=20) to control the 
generation length. We recommend setting `max_new_tokens` to control the maximum 
length of the generation.
     warnings.warn(
   Element: PredictionResult(example='translate English to Spanish: How are you 
doing?', inference=[{'translation_text': 'Cómo está acerca?'}, 
{'translation_text': 'This is the Apache Beam project.'}], model_id=None)
   Example: translate English to Spanish: How are you doing?
   Translated text: Cómo está acerca?
   
--------------------------------------------------------------------------------
   
   ```
   Only 1 example is output instead of 2.
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to