yeandy commented on code in PR #22069: URL: https://github.com/apache/beam/pull/22069#discussion_r907850829
########## sdks/python/apache_beam/examples/inference/README.md: ########## @@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use PyTorch locally, fir pip install torch==1.11.0 ``` -If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`. +If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you might also need to install `torchvision`. ``` pip install torchvision ``` -If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`. +If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you might also need to install `transformers`. ``` pip install transformers ``` -For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies). +For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the Review Comment: Remove space due to `Trailing whitespace` error. ```suggestion For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the ``` ########## sdks/python/apache_beam/examples/inference/README.md: ########## @@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The first item is the file name --- ## Language modeling -[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face. +[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face. The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file. ### Dataset and model for language modeling -- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too. +- **Required**: Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed. ``` import torch from transformers import BertForMaskedLM model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True) torch.save(model.state_dict(), 'BertForMaskedLM.pth') ``` -- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions. -- **Optional**: A path to a file called `SENTENCES` that contains sentences to feed into the model. It should look something like this: +- **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model. Review Comment: Remove space due to `Trailing whitespace` error. ```suggestion - **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model. ``` ########## sdks/python/apache_beam/examples/inference/README.md: ########## @@ -108,27 +110,27 @@ This writes the output to the `predictions.csv` with contents like: --- ## Image segmentation -[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture. +[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the `maskrcnn_resnet50_fpn` architecture. -The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file. +The pipeline reads images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes predictions to a text file. ### Dataset and model for image segmentation -You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images. -- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example: +Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths. +A popular dataset is from [Coco](https://cocodataset.org/#home). Follow their instructions to download the images. +- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example: ``` /absolute/path/to/image1.jpg /absolute/path/to/image2.jpg ``` -- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) -model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library. +- **Required**: Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands: ``` import torch from torchvision.models.detection import maskrcnn_resnet50_fpn model = maskrcnn_resnet50_fpn(pretrained=True) torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth') ``` -- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions. -- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths. +- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model. Review Comment: Remove space due to `Trailing whitespace` error. ```suggestion - **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
