Re: [D] Parallelization Enhancement Ideas [hamilton]

via GitHub Mon, 02 Feb 2026 18:30:11 -0800


GitHub user mitstake added a comment to the discussion: Parallelization 
Enhancement Ideas


ok so knowing a little bit more about the philosophy of the UX this might 
**not** be a great fit for hamilton. I'll get to why in a second. but to answer 
the question: "is your pattern what most people would do?" I think the answer 
is yes. the darl style allows you to write code how you would if you were 
writing naive imperative python with modular nested functions and still unlock 
the power and benefits of compiling to a dag, which I think is a compelling 
proposition on the surface. 
E.g. writing code like this can become dagified with some restrictions.
```
def A():
    b = B()
    return b + 1
```
The reason I think this might not be a good fit for the Hamilton philosophy is 
what else this unleashes on the environment (note I don't think all these 
things are bad in general, I like some of them personally, but I do think it's 
at odds with the Hamilton way of doing things, and including it could just lead 
to confusion/frustration among users). It might create a lower barrier to entry 
by looking like standard naive python code, but it also includes several things 
that could be considered **too** flexible. Some of those things are:

1. configuring routing logic in functions rather than externally, e.g. doing 
this instead of configuring with @config decorators
```
def PizzaSpecial(ngn):  # ngn stands for engine and is the darl equivalent of 
driver
    if is_vegetarian:
        pizza = ngn.VeggiePizza()
    else:
        pizza = ngn.MeatLoversPizza()
    ngn.collect()
    return pizza
```
2. parameterizing on unreasonable data - probably the biggest frustration I've 
experienced with parameterizing nodes is people passing in large objects into 
the arguments. this has caused all sorts of issues with the memory footprint of 
the graph blowing up because function args are stored on the graph object. and 
if you're passing around 1gb dataframes through all your nodes that balloons 
very quickly. Also function arguments necessarily need to be hashable which can 
be annoying. Also if users modify function argument objects in place that leads 
to issues so you need to either do deepcopys of the arguments or depend on user 
education.
3. chaining results from function calls - if you allow parameterized functions 
one of the first things people are going to want to do is pass the result of 
one function call to another, e.g:
```
def A(ngn):
    b = ngn.B()
    c = ngn.C(b)  # <--- result of B passed into C
    return b + c
```
this is possible to do and implemented in darl, however it significantly 
complicates the graph building logic (and often the workflow business logic 
too). And to your point of the UX should make it harder to accomplish certain 
things, I'd say this is a good example. Instead of encouraging users to do the 
above, you'd prefer them to do something like this:
```
def A(B, C):
    return B + C

def C(B):
    return B + 1
```
4. Your parameterization needs to propagate through every function in the chain 
- one nice thing about Hamilton Parallelizable is that the functions in 
parallelized branch can live outside the context of the variable they're being 
parallelized on. With the darl style for loop everything in the parallelized 
branch needs to be parameterized to take that variable as well.

There's probably some other things I'm forgetting too.

**Update functionality**

So in the same vein regarding my suggestion on the `update` function 
(https://github.com/apache/hamilton/discussions/1397, 
https://github.com/mitstake/darl/tree/main?tab=readme-ov-file#updates-and-shocks),
 I personally think this is a good feature for enabling flexible scenario 
configuration. But the more I'm picking up on the hamilton philosophy, I think 
it is potentially at odds there too. This feature would factor configuration of 
your driver out of explicitly defined and visible @config decorated workflows 
and open up the ability to configure your driver "anywhere" and "anyhow" which 
means you have a lot less visibility/lineage into what configuration 
constitutes your driver. Again I think it has upsides, but potentially heavier 
downsides from the hamilton pov?

**Tracing**

One thing I think would definitely be useful for Hamilton and would not 
conflict with any UX philosophy is the darl trace. 
https://github.com/mitstake/darl/tree/main?tab=readme-ov-file#debugging-tracing-and-replaying.
 This makes debugging and navigating through a workflow execution very easy and 
pleasant.

Anyway, I may be overthinking all of this and both the darl style functions and 
the update functionality make sense to you within hamilton. Interested to hear 
your thoughts on this, and happy to continue the conversation.

GitHub link: 
https://github.com/apache/hamilton/discussions/1412#discussioncomment-15678689

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Re: [D] Parallelization Enhancement Ideas [hamilton]

Reply via email to