Good Morning/Afternoon/Evening folks,

The current support for beam-plugins is experimental and we would like to
have it as a first class member of the beam library for Python Runner v2.
This helps us load plugins into the runtime before starting the SdkHarness.
https://github.com/apache/beam/pull/16920 is a PR I created for this
purpose. Wanted to gather some thoughts around the approach here and have
it standardized. The current implementation of beam plugins allows users to
extend a class from BeamPlugin and it gets automatically populated in the
--beam_plugin PipelineOption, e.g.: FileSystem
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystem.py#L475>.
This creates the pipeline option as,

--beam_plugin=[

  'apache_beam.io.aws.s3filesystem.S3FileSystem',

  'apache_beam.io.filesystem.FileSystem',

  'apache_beam.io.hadoopfilesystem.HadoopFileSystem',

  'apache_beam.io.localfilesystem.LocalFileSystem',

  'apache_beam.io.gcp.gcsfilesystem.GCSFileSystem',

  'apache_beam.io.azure.blobstoragefilesystem.BlobStorageFileSystem'

]

Another way is to provide a module via the --beam_plugin PipelineOption,
e.g.:

--beam_plugin='twitter.beam.rule_the_world'

The current implementation in the PR supports both these approaches but
would love to have a standardized way forward and have it documented. Would
love to hear your thoughts about this.

Thanks & Regards,
Rahul Iyer

Reply via email to