eric-czech commented on issue #1358: URL: https://github.com/apache/hamilton/issues/1358#issuecomment-3175800355
> can you specify a little more on what the use-case is? Certainly. It would involve small to medium scale workflows on large datasets for: - Running data integration and processing workflows via Pandas, Spark, Dask, Xarray, Ray, etc. - Running training jobs on SLURM clusters and/or Neocloud providers - Running inference and post-processing pipelines (with or without accelerators) Hamilton seems like an obvious fit for handling the large number of small steps related to pre-processing, e.g. Longer term, I'm also interested in seeing to what extent it may be helpful for orchestrating work across mixed hardware and multiple cloud providers. To be clear, I don't expect it to do much other than solve for building DAGs and offering some reasonable semantics over retries and caching of expensive results. Provisioning, configuration, pickling functions, validating schemas, etc. are all things I would expect other tools to do -- I'm only really looking to Hamilton to define workflows via Python rather than a DSL or API complicated enough to be called a DSL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
