yeandy commented on code in PR #22250: URL: https://github.com/apache/beam/pull/22250#discussion_r920371691
########## website/www/site/content/en/documentation/sdks/python-machine-learning.md: ########## @@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them. -### Prediction results missing - -When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. +### Incorrect inferences in the PredictionResult object -Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. +In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`. -When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. +The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. -To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49). +To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49). Review Comment: ```suggestion To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper should override `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49). ``` ########## website/www/site/content/en/documentation/sdks/python-machine-learning.md: ########## @@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them. -### Prediction results missing - -When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. +### Incorrect inferences in the PredictionResult object -Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. +In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`. -When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. +The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. Review Comment: ```suggestion The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. ``` ########## website/www/site/content/en/documentation/sdks/python-machine-learning.md: ########## @@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them. -### Prediction results missing - -When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. +### Incorrect inferences in the PredictionResult object -Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. +In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`. Review Comment: The intention here is to say that there could be a mapping {'key1': predictions, 'key2': metadata2, 'key3': metadata3, } ```suggestion In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`. ``` ########## website/www/site/content/en/documentation/sdks/python-machine-learning.md: ########## @@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them. -### Prediction results missing - -When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. +### Incorrect inferences in the PredictionResult object -Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. +In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`. -When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. +The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. -To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49). +To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49). ### Unable to batch tensor elements RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur. -To avoid this issue: +To avoid this issue, use one of the following solutions: -1. Either use elements that have the same size, or resize image inputs and word embeddings to make them -the same size. Depending on the language model and encoding technique, this option might not be available. +1. Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. Review Comment: ```suggestion 1. Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. If this is the case, consider solution 2. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
