Hello everyone, It would be great to improve the workflow engine in Baremaps (package: org.apache.baremaps.workflow). In the current version, a workflow is a directed acyclic graph (DAG) of steps. Each step can have one or more tasks executed sequentially or in parallel. The inputs and outputs of the tasks are set manually. Some of the outputs (e.g., a table created in a database) are not described. Furthermore, some resources (e.g., DataSources) are shared across the workflow with a context object, but one task must be aware of what another task did to benefit from shared resources. This approach is loosely based on GitHub Actions.
A nice improvement would be to remove the notion of step, to systematically describe the inputs and outputs of the tasks, and to introduce a format in the configuration file to describe the shared resources accessed via the context object. This would probably make the configuration file of the workflow more difficult to read, but at least, everything would be declared in it. The DAG could be inferred from the inputs and outputs of the tasks. This new approach would probably be closer to what AWS Data Pipeline does. https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-redshift-define-pipeline-cli.html I’d love to gather both technical and non-technical feedbacks regarding this question. If you have any experiences, whether good, mixed, or bad, with the current approach, please do not hesitate to share them. Additionally, if you have experience with other workflow technologies, it would be valuable to hear about those as well. Thanks a lot for your help, Bertil
signature.asc
Description: Message signed with OpenPGP