GitHub user mitstake added a comment to the discussion: Parallelization
Enhancement Ideas
ok so knowing a little bit more about the philosophy of the UX this might
**not** be a great fit for hamilton. I'll get to why in a second. but to answer
the question: "is your pattern what most people would do?" I think the answer
is yes. the darl style allows you to write code how you would if you were
writing naive imperative python with modular nested functions and still unlock
the power and benefits of compiling to a dag, which I think is a compelling
proposition on the surface.
E.g. writing code like this can become dagified with some restrictions.
```
def A():
b = B()
return b + 1
```
The reason I think this might not be a good fit for the Hamilton philosophy is
what else this unleashes on the environment (note I don't think all these
things are bad in general, I like some of them personally, but I do think it's
at odds with the Hamilton way of doing things, and including it could just lead
to confusion/frustration among users). It might create a lower barrier to entry
by looking like standard naive python code, but it also includes several things
that could be considered **too** flexible. Some of those things are:
1. configuring routing logic in functions rather than externally, e.g. doing
this instead of configuring with @config decorators
```
def PizzaSpecial(ngn): # ngn stands for engine and is the darl equivalent of
driver
if is_vegetarian:
pizza = ngn.VeggiePizza()
else:
pizza = ngn.MeatLoversPizza()
ngn.collect()
return pizza
```
2. parameterizing on unreasonable data - probably the biggest frustration I've
experienced with parameterizing nodes is people passing in large objects into
the arguments. this has caused all sorts of issues with the memory footprint of
the graph blowing up because function args are stored on the graph object. and
if you're passing around 1gb dataframes through all your nodes that balloons
very quickly. Also function arguments necessarily need to be hashable which can
be annoying. Also if users modify function argument objects in place that leads
to issues so you need to either do deepcopys of the arguments or depend on user
education.
3. chaining results from function calls - if you allow parameterized functions
one of the first things people are going to want to do is pass the result of
one function call to another, e.g:
```
def A(ngn):
b = ngn.B()
c = ngn.C(b) # <--- result of B passed into C
return b + c
```
this is possible to do and implemented in darl, however it significantly
complicates the graph building logic (and often the workflow business logic
too). And to your point of the UX should make it harder to accomplish certain
things, I'd say this is a good example. Instead of encouraging users to do the
above, you'd prefer them to do something like this:
```
def A(B, C):
return B + C
def C(B):
return B + 1
```
4. Your parameterization needs to propagate through every function in the chain
- one nice thing about Hamilton Parallelizable is that the functions in
parallelized branch can live outside the context of the variable they're being
parallelized on. With the darl style for loop everything in the parallelized
branch needs to be parameterized to take that variable as well.
There's probably some other things I'm forgetting too.
**Update functionality**
So in the same vein regarding my suggestion on the `update` function
(https://github.com/apache/hamilton/discussions/1397,
https://github.com/mitstake/darl/tree/main?tab=readme-ov-file#updates-and-shocks),
I personally think this is a good feature for enabling flexible scenario
configuration. But the more I'm picking up on the hamilton philosophy, I think
it is potentially at odds there too. This feature would factor configuration of
your driver out of explicitly defined and visible @config decorated workflows
and open up the ability to configure your driver "anywhere" and "anyhow" which
means you have a lot less visibility/lineage into what configuration
constitutes your driver. Again I think it has upsides, but potentially heavier
downsides from the hamilton pov?
**Tracing**
One thing I think would definitely be useful for Hamilton and would not
conflict with any UX philosophy is the darl trace.
https://github.com/mitstake/darl/tree/main?tab=readme-ov-file#debugging-tracing-and-replaying.
This makes debugging and navigating through a workflow execution very easy and
pleasant.
Anyway, I may be overthinking all of this and both the darl style functions and
the update functionality make sense to you within hamilton. Interested to hear
your thoughts on this, and happy to continue the conversation.
GitHub link:
https://github.com/apache/hamilton/discussions/1412#discussioncomment-15678689
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]