GitHub user mitstake added a comment to the discussion: Parallelization Enhancement Ideas
> To help me think through your proposal, I want to have a clear mental model > of "when" things would occur to help design the API. You can take a look at how it works in my darl library. That’s a fully “production” ready implementation and shouldn’t be too tricky to decipher. https://github.com/mitstake/darl/blob/main/src/darl/graph_build/sequential.py The basic idea is that the standard asset style functions and the driver/parameter aware (I’ll call them darl functions) can all be resolved in the same single pass through the graph build. I don’t have any insight into how Hamilton builds the graph so it might not be as straightforward without some rearchitecting there. In fact the darl functions are just a generalization of the asset functions. Darl also allows asset functions and just wraps them to convert them to the darl style under the hood: https://github.com/mitstake/darl/blob/112ff936045e23f7921129aa97348e893d66b85a/src/darl/special_providers.py#L57 But you don’t even need to do that honestly, in your graph build you can just treat them separate. A pseudo code example of the graph build looks like this (sorry on mobile so this might not be great) ``` class FakeDriver: def __init__(self): self. called_dependencies = [] def __getattr__(self, item): def caller(*args): self. called_dependencies.append((item, args)) return Proxy() return caller graph = Graph() def build_graph(func_name, args): node = (func_name, args) fake_driver = FakeDriver() func = real_driver.get_func(func_name) if is_darl_func(func): try: func(fake_driver, *args) except CollectExitException: # raised by ngn.collect() pass deps = fake_driver.called_dependencies else: deps = parse_names_from_signature(func) for dep_func_name, dep_args in deps: dep_node = (dep_func_name, dep_args) graph.add_edge(dep_node, node) build_graph(dep_func_name, dep_args) # recursive graph build ``` Hopefully that mostly gets the idea across. Regarding what I mentioned about 1 graph per driver, a darl function which is parameterized can technically result in infinitely many different graphs if the args aren’t constrained (remember you can also have a non parameterized darl style function, see snippet below). So one potential thing you could do to limit it to one graph still is require that a root execution on the driver be a regular asset style function or non-parameterized darl function. This way the set of parameters to get passed to a parameterized darl function will be known at graph build time, instead of driver invocation time. ``` def parameterized_darl_func(ngn, x): … def non_parameterized_darl_func(ngn): … ngn.parameterized_darl_func(99) ngn.non_parameterized_darl_func() ``` > classes + functions -> nodes Regarding your point about defining nodes with classes, darl allows this too. Technically it’s just any callable works, so you can do something like this: ``` # asset style callable class MyResult1: def __init__(self, val): self.val = val def __call__(self, MyDep): return MyDep + self.val # darl style callable class MyResult2: def __init__(self, val): self.val = val def __call__(self, ngn): md = ngn.MyDep() ngn.collect() return md + self.val ``` > how do we catch errors? check out the way it works in darl here: https://github.com/mitstake/darl/tree/main?tab=readme-ov-file#error-handling > What would match your ideal development process mental model? Honestly the way things work in darl exactly matches my mental model. It’s a model I’ve worked with for several years with great success. I would look through that/the readme and see how you can borrow ideas from that for Hamilton. I think some of those ideas would make Hamilton a lot better/more powerful, especially the ones that I linked from the other discussions (tracing, updating/scoped updating, more targeted error handling). All of those have sections about them in the readme. Let me know if there’s anything you want to discuss/dive into on this. Id be happy to discuss further. GitHub link: https://github.com/apache/hamilton/discussions/1412#discussioncomment-15677976 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
