[
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=765797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765797
]
ASF GitHub Bot logged work on BEAM-14068:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/May/22 01:43
Start Date: 04/May/22 01:43
Worklog Time Spent: 10m
Work Description: AnandInguva commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r864409165
##########
sdks/python/apache_beam/ml/inference/examples/pytorch_image_classification.py:
##########
@@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# pylint: skip-file
+
+import argparse
+import io
+import os
+from functools import partial
+
+import apache_beam as beam
+import torch
+import torchvision
+import torchvision.transforms as transforms
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+
+_IMG_SIZE = (224, 224)
+normalize = transforms.Normalize(
+ mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+
+transform = transforms.Compose([
+ transforms.Resize(_IMG_SIZE),
+ transforms.ToTensor(),
+ normalize,
+])
+
+
+def read_image(path_to_file: str, path_to_dir: str):
+ path_to_file = os.path.join(path_to_dir, path_to_file)
+ with FileSystems().open(path_to_file, 'r') as file:
+ data = Image.open(io.BytesIO(file.read())).convert('RGB')
+ return path_to_file, data
+
+
+def preprocess_data(data):
+ return transform(data)
+
+
+class PostProcessor(beam.DoFn):
+ """Post process PredictionResult to output filename and
+ prediction using torch."""
+ def process(self, element):
+ filename, prediction_result = element
+ prediction = torch.argmax(prediction_result.inference, dim=0)
+ yield filename + ',' + str(int(prediction))
+
+
+def setup_pipeline(options: PipelineOptions, args=None):
Review Comment:
Yes, it should be run_pipeline(). Started with setup_pipeline(). Will edit
once I put it to review
Issue Time Tracking
-------------------
Worklog Id: (was: 765797)
Time Spent: 2h (was: 1h 50m)
> RunInference Benchmarking tests
> -------------------------------
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Anand Inguva
> Assignee: Anand Inguva
> Priority: P2
> Time Spent: 2h
> Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and
> possibly TFX. These benchmarks would be the integration tests that exercise
> several software components using Beam, PyTorch, Scikit learn and TensorFlow
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle).
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)