Shiny Hoppy people! I've been wrestling with an API / architecture issue that needs some resolution. The topic at hand is the Apache Beam integration in the form of our engines/beam plugin.
Currently, the handling of the various Beam-specific transforms is hard-coded <https://github.com/apache/incubator-hop/tree/master/plugins/engines/beam/src/main/java/org/apache/hop/beam/pipeline/handler> and I don't like it. For example, a `Memory Group By` transform will result in the inclusion of a GroupByKey to be created and applied to a Beam PCollection. It would be ideal if we could move the code for said 'Memory Group By' Beam logic to plugins/transforms/memgroupby. However, that would require Apache Beam dependencies to be sprinkled over a lot of plugins which I don't like. Right now we solve the dependency in the dependencies.xml file where we have something like ../../transforms/memgroupby to drag the required jar file(s) in. Here is what I would like to do: - Create a bunch of extra modules in plugins/engine/beam for memgroupby, kafka and others. - Create a new plugin type which gets registered by Apache Beam: a Beam Transform Handler Plugin Type. Every module would then be dependent on the Beam parent and would implement a beam transform handler. - The parent dependencies.xml file will be gone and replaced by a bunch of one-liners in the sub-modules. This way someone that is not interested in, say, 'Kafka' can still remove it from the plugins albeit in 2 plugin folders. Let me know if you have any objections or better ideas. I've wrecked my brain for a long time now to find a better way so any help is welcome. Cheers, Matt
