GitHub user mitstake added a comment to the discussion: Parallelization 
Enhancement Ideas


> To help me think through your proposal, I want to have a clear mental model 
> of "when" things would occur to help design the API.

You can take a look at how it works in my darl library. That’s a fully 
“production” ready implementation and shouldn’t be too tricky to decipher. 
https://github.com/mitstake/darl/blob/main/src/darl/graph_build/sequential.py

The basic idea is that the standard asset style functions and the 
driver/parameter aware (I’ll call them darl functions) can all be resolved in 
the same single pass through the graph build. I don’t have any insight into how 
Hamilton builds the graph so it might not be as straightforward without some 
rearchitecting there. In fact the darl functions are just a generalization of 
the asset functions. Darl also allows asset functions and just wraps them to 
convert them to the darl style under the hood: 
https://github.com/mitstake/darl/blob/112ff936045e23f7921129aa97348e893d66b85a/src/darl/special_providers.py#L57
But you don’t even need to do that honestly, in your graph build you can just 
treat them separate. 

A pseudo code example of the graph build looks like this (sorry on mobile so 
this might not be great) 

```
class FakeDriver:
    def __init__(self):
        self. called_dependencies = []
    def __getattr__(self, item):
        def caller(*args):
            self. called_dependencies.append((item, args))
            return Proxy()
        return caller

graph = Graph()
def build_graph(func_name, args):
    node = (func_name, args)
    fake_driver = FakeDriver()
    func = real_driver.get_func(func_name)
    if is_darl_func(func):
        try:
            func(fake_driver, *args)
        except CollectExitException:      # raised by ngn.collect()
            pass
        deps = fake_driver.called_dependencies
    else:
        deps = parse_names_from_signature(func)
    for dep_func_name, dep_args in deps:
        dep_node = (dep_func_name, dep_args)
        graph.add_edge(dep_node, node)
        build_graph(dep_func_name, dep_args)    # recursive graph build
```

Hopefully that mostly gets the idea across. Regarding what I mentioned about 1 
graph per driver, a darl function which is parameterized can technically result 
in infinitely many different graphs if the args aren’t constrained (remember 
you can also have a non parameterized darl style function, see snippet below). 
So one potential thing you could do to limit it to one graph still is require 
that a root execution on the driver be a regular asset style function or 
non-parameterized darl function. This way the set of parameters to get passed 
to a parameterized darl function will be known at graph build time, instead of 
driver invocation time. 

```
def parameterized_darl_func(ngn, x):
    …

def non_parameterized_darl_func(ngn):
    …

ngn.parameterized_darl_func(99)
ngn.non_parameterized_darl_func()
```

> classes + functions -> nodes

Regarding your point about defining nodes with classes, darl allows this too. 
Technically it’s just any callable works, so you can do something like this:

```
# asset style callable
class MyResult1:
    def __init__(self, val):
        self.val = val

    def __call__(self, MyDep):
        return MyDep + self.val

# darl style callable
class MyResult2:
    def __init__(self, val):
        self.val = val

    def __call__(self, ngn):
        md = ngn.MyDep()
        ngn.collect()
        return md + self.val
```

> how do we catch errors?

check out the way it works in darl here: 
https://github.com/mitstake/darl/tree/main?tab=readme-ov-file#error-handling

> What would match your ideal development process mental model?

Honestly the way things work in darl exactly matches my mental model. It’s a 
model I’ve worked with for several years with great success. I would look 
through that/the readme and see how you can borrow ideas from that for 
Hamilton. I think some of those ideas would make Hamilton a lot better/more 
powerful, especially the ones that I linked from the other discussions 
(tracing, updating/scoped updating, more targeted error handling). All of those 
have sections about them in the readme. 

Let me know if there’s anything you want to discuss/dive into on this. Id be 
happy to discuss further. 

GitHub link: 
https://github.com/apache/hamilton/discussions/1412#discussioncomment-15677976

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to