GitHub user mitstake edited a comment on the discussion: Parallelization
Enhancement Ideas
With regards to equivalence probably easiest to show it in terms of the simple
ABC example.
```
def A():
return 3
def B(A):
return A / 3
def C(A, B):
return A ** 2 * B
```
This works in both Hamilton and darl. However, with darl you can also
equivalently define the above as:
```
def A():
return 3
def B(ngn):
a = ngn.A()
ngn.collect()
return a / 3
def C(ngn):
a = ngn.A()
b = ngn.B()
ngn.collect()
return a ** 2 * b
```
(You can also mix the styles throughout your different functions)
This way instead of parsing the signature to identify the dependencies, you
call the function itself during the graph build step. However, in the graph
build step you only run up to `ngn.collect()` and then you exit. `ngn` in the
graph build step just collects the name called on it which is then traversed to
to collect their dependencies recursively, until your graph is built.
The benefit of doing it this way is that you can parameterize your functions.
Which among other things lets you use basic constructs like for loops to build
nodes in your graph. E.g:
```
def USGDP(ngn):
gdp = 0
for state in ALL_STATES:
gdp += ngn.StateGDP(state)
ngn.collect()
return gdp
def StateGDP(ngn, state):
return len(state)
```
This will create a graph like:
<img width="638" height="202" alt="image"
src="https://github.com/user-attachments/assets/be70bcb7-6934-4b0b-a127-3760e9280387"
/>
With this there's no need for queues, or Parallelizable/Collect type hints or
anything. It's not really parallelization, it's just another way to define
nodes in the graph, thus "nested parallelization" is nothing special, you can
just define whatever loops wherever you want. Actual parallel execution is left
to whatever executor you want to plugin like dask or ray.
GitHub link:
https://github.com/apache/hamilton/discussions/1412#discussioncomment-15662461
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]