[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

GitBox Mon, 27 Jun 2022 16:36:57 -0700


TheNeuralBit commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907877286



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) 
contains an implementation for a RunInference pipeline that performs image 
classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) 
contains an implementation for a RunInference pipeline that performs image 
classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to 
the PyTorch implementation of RunInference, and then writes the predictions to 
a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images 
to the PyTorch implementation of RunInference, and then writes the predictions 
to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your 
`IMAGES_DIR` directory. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. Paths can be different types of URIs such as your local 
file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them 
in this directory. The directory is not required if image names in the input 
file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow 
their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` that you want to use to 
run image classification. Paths can be different types of URIs such as your 
local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved parameters of the maskrcnn_resnet50_fpn model. You will need to download 
the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. Note that this requires 
`torchvision` library.
+- **Required**: Download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. This model requires the 
torchvision library. To download this model, run the following commands:

Review Comment:
   nit: this is a script, not commands
   ```suggestion
   - **Required**: Download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. This model requires the 
torchvision library. To download this model, run the following script:



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) 
contains an implementation for a RunInference pipeline that performs image 
classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) 
contains an implementation for a RunInference pipeline that performs image 
classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to 
the PyTorch implementation of RunInference, and then writes the predictions to 
a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images 
to the PyTorch implementation of RunInference, and then writes the predictions 
to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your 
`IMAGES_DIR` directory. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. Paths can be different types of URIs such as your local 
file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them 
in this directory. The directory is not required if image names in the input 
file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow 
their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` that you want to use to 
run image classification. Paths can be different types of URIs such as your 
local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved parameters of the maskrcnn_resnet50_fpn model. You will need to download 
the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. Note that this requires 
`torchvision` library.
+- **Required**: Download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. This model requires the 
torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will 
write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` 
have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the 
saved parameters of the `mobilenet_v2` model.

Review Comment:
   Could we instead consolidate the above bullet with this one? That way each 
bullet corresponds to one of the inputs in "Running 
`pytorch_image_classification.py`." Something like this:
   
   ```suggestion
   - **Required**: A path to a file named `MODEL_STATE_DICT` that contains the 
saved parameters of the `mobilenet_v2` model. You can download the mobilenet_v2 
model from ... use the following script:
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The 
first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an 
implementation for a RunInference pipeline that performs masked language 
modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM 
architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an 
implementation for a RunInference pipeline that performs masked language 
modeling (that is, decoding a masked token in a sentence) using the 
`BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last 
word into a `[MASK]` token, passes the masked sentence to the PyTorch 
implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved parameters of the BertForMaskedLM model. You will need to download the 
[BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
 model from Hugging Face's repository of pretrained models. Make sure you have 
installed `transformers` too.
+- **Required**: Download the 
[BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
 model from Hugging Face's repository of pretrained models. You must already 
have `transformers` installed. To download this model, run the following 
commands:
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will 
write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to 
feed into the model. It should look something like this:
+- **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the 
saved parameters of the `BertForMaskedLM` model.

Review Comment:
   Similarly here



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use 
PyTorch locally, fir
 pip install torch==1.11.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` 
[subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights),
 you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` 
[subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights),
 you might also need to install `torchvision`.

Review Comment:
   @yeandy shouldn't these be "will" or "you must install"? Are there cases 
where users can use this without installing them?



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) 
contains an implementation for a RunInference pipeline that performs image 
classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) 
contains an implementation for a RunInference pipeline that performs image 
classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to 
the PyTorch implementation of RunInference, and then writes the predictions to 
a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images 
to the PyTorch implementation of RunInference, and then writes the predictions 
to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your 
`IMAGES_DIR` directory. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. Paths can be different types of URIs such as your local 
file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them 
in this directory. The directory is not required if image names in the input 
file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow 
their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` that you want to use to 
run image classification. Paths can be different types of URIs such as your 
local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved parameters of the maskrcnn_resnet50_fpn model. You will need to download 
the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. Note that this requires 
`torchvision` library.
+- **Required**: Download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
 model from Pytorch's repository of pretrained models. This model requires the 
torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will 
write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` 
have absolute paths.

Review Comment:
   Was removing this intentional? It's still referenced below



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The 
first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an 
implementation for a RunInference pipeline that performs masked language 
modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM 
architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an 
implementation for a RunInference pipeline that performs masked language 
modeling (that is, decoding a masked token in a sentence) using the 
`BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last 
word into a `[MASK]` token, passes the masked sentence to the PyTorch 
implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved parameters of the BertForMaskedLM model. You will need to download the 
[BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
 model from Hugging Face's repository of pretrained models. Make sure you have 
installed `transformers` too.
+- **Required**: Download the 
[BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
 model from Hugging Face's repository of pretrained models. You must already 
have `transformers` installed. To download this model, run the following 
commands:
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will 
write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to 
feed into the model. It should look something like this:
+- **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the 
saved parameters of the `BertForMaskedLM` model.

Review Comment:
   Also there's a small typo
   ```suggestion
   - **Required**: A path to a file named `MODEL_STATE_DICT` that contains the 
saved parameters of the `BertForMaskedLM` model.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Reply via email to