damccorm commented on code in PR #33183:
URL: https://github.com/apache/beam/pull/33183#discussion_r1852953984
##########
examples/notebooks/beam-ml/automatic_model_refresh.ipynb:
##########
@@ -244,135 +233,145 @@
"# To expedite the model update process, it's recommended to set
num_workers>1.\n",
"# https://github.com/apache/beam/issues/28776\n",
"options.view_as(WorkerOptions).num_workers = 5"
- ],
- "metadata": {
- "id": "wWjbnq6X-4uE"
- },
- "execution_count": null,
- "outputs": [{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "\n"
- ]
- }]
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "Install the `tensorflow` and `tensorflow_hub` dependencies on
Dataflow. Use the `requirements_file` pipeline option to pass these
dependencies."
- ],
"metadata": {
"id": "HTJV8pO2Wcw4"
- }
+ },
+ "source": [
+ "Install the `tensorflow` and `tensorflow_hub` dependencies on
Dataflow. Use the `requirements_file` pipeline option to pass these
dependencies."
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "lEy4PkluWbdm"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
"source": [
"# In a requirements file, define the dependencies required for the
pipeline.\n",
"!printf
'tensorflow==2.15.0\\ntensorflow_hub==0.16.1\\nkeras==2.15.0\\nPillow==11.0.0'
> ./requirements.txt\n",
"# Install the pipeline dependencies on Dataflow.\n",
"options.view_as(SetupOptions).requirements_file =
'./requirements.txt'"
- ],
- "metadata": {
- "id": "lEy4PkluWbdm"
- },
- "execution_count": null,
- "outputs": [{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "\n"
- ]
- }]
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "_AUNH_GJk_NE"
+ },
"source": [
"## Use the TensorFlow model handler\n",
" This example uses `TFModelHandlerTensor` as the model handler and
the `resnet_101` model trained on [ImageNet](https://www.image-net.org/).\n",
"\n",
"\n",
"For the Dataflow runner, you need to store the model in a remote
location that the Apache Beam pipeline can access. For this example, download
the `ResNet101` model, and upload it to the Google Cloud Storage bucket.\n"
- ],
- "metadata": {
- "id": "_AUNH_GJk_NE"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ibkWiwVNvyrn"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
"source": [
"model = tf.keras.applications.resnet.ResNet101()\n",
"model.save('resnet101_weights_tf_dim_ordering_tf_kernels.keras')\n",
"# After saving the model locally, upload the model to GCS bucket and
provide that gcs bucket `URI` as `model_uri` to the `TFModelHandler`\n",
"# Replace `BUCKET_NAME` value with actual bucket name.\n",
Review Comment:
Can we get rid of this comment now?
##########
examples/notebooks/beam-ml/automatic_model_refresh.ipynb:
##########
@@ -534,108 +541,118 @@
" | \"ApplyWindowing\" >>
beam.WindowInto(beam.window.FixedWindows(10))\n",
" | \"RunInference\" >>
RunInference(model_handler=model_handler,\n",
"
model_metadata_pcoll=side_input_pcoll))"
- ],
- "metadata": {
- "id": "_AjvvexJ_hUq"
- },
- "execution_count": null,
- "outputs": [{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "\n"
- ]
- }]
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "lTA4wRWNDVis"
+ },
"source": [
"4. Post-process the `PredictionResult` object.\n",
"When the inference is complete, RunInference outputs a
`PredictionResult` object that contains the fields `example`, `inference`, and
`model_id`. The `model_id` field identifies the model used to run the
inference. The `PostProcessor` returns the predicted label and the model ID
used to run the inference on the predicted label."
- ],
- "metadata": {
- "id": "lTA4wRWNDVis"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "9TB76fo-_vZJ"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
"source": [
"post_processor = (\n",
" inferences\n",
" | \"PostProcessResults\" >> beam.ParDo(PostProcessor())\n",
" | \"LogResults\" >> beam.Map(logging.info))"
- ],
- "metadata": {
- "id": "9TB76fo-_vZJ"
- },
- "execution_count": null,
- "outputs": [{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "\n"
- ]
- }]
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "wYp-mBHHjOjA"
+ },
"source": [
"### Watch for the model update\n",
"\n",
"After the pipeline starts processing data, when you see output
emitted from the RunInference `PTransform`, upload a `resnet152` model saved in
the `.keras` format to a Google Cloud Storage bucket location that matches the
`file_pattern` you defined earlier.\n"
- ],
- "metadata": {
- "id": "wYp-mBHHjOjA"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "FpUfNBSWH9Xy"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
"source": [
"model = tf.keras.applications.resnet.ResNet152()\n",
"model.save('resnet152_weights_tf_dim_ordering_tf_kernels.keras')\n",
"# Replace the `BUCKET_NAME` with the actual bucket name.\n",
Review Comment:
Same - can we get rid of this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]