Hello everyone,

It would be great to improve the workflow engine in Baremaps (package: 
org.apache.baremaps.workflow). In the current version, a workflow is a directed 
acyclic graph (DAG) of steps. Each step can have one or more tasks executed 
sequentially or in parallel. The inputs and outputs of the tasks are set 
manually. Some of the outputs (e.g., a table created in a database) are not 
described. Furthermore, some resources (e.g., DataSources) are shared across 
the workflow with a context object, but one task must be aware of what another 
task did to benefit from shared resources. This approach is loosely based on 
GitHub Actions.

A nice improvement would be to remove the notion of step, to systematically 
describe the inputs and outputs of the tasks, and to introduce a format in the 
configuration file to describe the shared resources accessed via the context 
object. This would probably make the configuration file of the workflow more 
difficult to read, but at least, everything would be declared in it. The DAG 
could be inferred from the inputs and outputs of the tasks. This new approach 
would probably be closer to what AWS Data Pipeline does.

https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-redshift-define-pipeline-cli.html

I’d love to gather both technical and non-technical feedbacks regarding this 
question. If you have any experiences, whether good, mixed, or bad, with the 
current approach, please do not hesitate to share them. Additionally, if you 
have experience with other workflow technologies, it would be valuable to hear 
about those as well.

Thanks a lot for your help,

Bertil

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to