charlespnh commented on code in PR #35715: URL: https://github.com/apache/beam/pull/35715#discussion_r2241302621
########## sdks/python/apache_beam/yaml/yaml_ml.py: ########## @@ -29,14 +32,36 @@ from apache_beam.yaml import options from apache_beam.yaml.yaml_utils import SafeLineLoader + +def list_submodules(package): + """ + Lists all submodules within a given package. + """ + submodules = [] + for _, module_name, _ in pkgutil.walk_packages( + package.__path__, package.__name__ + '.'): + if 'test' in module_name: + continue + submodules.append(module_name) + return submodules + + try: from apache_beam.ml.transforms import tft from apache_beam.ml.transforms.base import MLTransform # TODO(robertwb): Is this all of them? - _transform_constructors = tft.__dict__ + _transform_constructors = {} except ImportError: tft = None # type: ignore +# Load all available ML Transform modules +for module_name in list_submodules(beam.ml.transforms): + try: + module = import_module(module_name) + _transform_constructors |= module.__dict__ + except ImportError as e: + logging.warning('Could not load ML transform module %s: %s', module_name, e) Review Comment: I'm personally a +1 for option 2, i.e. not having to install everything if I'm only using a subset of these transforms, and there's a well-defined error message when the pipeline uses a transform that doesn't have the dependencies installed properly. CC @chamikaramj and @liferoad. Not sure how our user base is using MLTransform. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org