charlespnh commented on issue #35788:
URL: https://github.com/apache/beam/issues/35788#issuecomment-3161379154
Here is a minimal reproducible example of the issue:
YAML pipeline `test.yaml`
```
pipeline:
transforms:
- type: MyTransform
name: MyTransform
input: {}
config:
model_artifact_path: "gs://dataflow-samples/shakespeare/kinglear.txt"
providers:
- type: pythonPackage
config:
packages:
- ./dist/transform_provider-0.1.0.tar.gz
transforms:
MyTransform: "transform_provider.MyTransform"
```
Implementation of MyTransform is in `transform_provider.py`:
```
import apache_beam as beam
from apache_beam.io.filesystems import FileSystems
class MyTransform(beam.PTransform):
def __init__(self, model_artifact_path):
self.model_artifact_path = model_artifact_path
self.file = FileSystems.open(self.model_artifact_path, 'r')
def expand(self, pcoll):
# no-op
return (
pcoll
)
```
Building the Python distribution package with `pyproject.toml` below and
`poetry`:
```
[tool.poetry]
name = "transform_provider"
version = "0.1.0"
description = "..."
authors = ["Your Name <[email protected]>"]
license = "Apache License 2.0"
readme = "README.md"
packages = [
{ include = "transform_provider.py" },
]
[tool.poetry.dependencies]
python = "^3.11"
apache-beam = {extras = ["gcp", "yaml"], version = "^2.66.0"}
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
```
Beam anomaly detection module internally uses `FileSystems.open()` to load
the model from GCS, and this gRPC error seems to be coming from
`FileSystems.open()`...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]