rszper commented on code in PR #22069: URL: https://github.com/apache/beam/pull/22069#discussion_r922427110
########## sdks/python/apache_beam/examples/inference/README.md: ########## @@ -218,16 +228,19 @@ is the word that the model predicts for the mask. The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file. ### Dataset and model for language modeling -- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this: + +To use this transform, you need a dataset and model for language modeling. + +1. Create a file named `INPUT` that contains labels and pixels to feed into the model. Each row should have comma-separated elements. The first element is the label. All other elements are pixel values. The content of the file should be similar to the following example: ``` 1,0,0,0... 0,0,0,0... 1,0,0,0... 4,0,0,0... ... ``` -- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions. -- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models. +2. Create a file named `OUTPUT`. This file is used by the pipeline to write the predictions. Review Comment: I updated the wording to: Note the path to the `OUTPUT` file created by the pipeline. This file is used by the pipeline to write the predictions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
