Re: [I] Add a dry run feature [hamilton]

via GitHub Tue, 02 Sep 2025 21:22:03 -0700


zilto commented on issue #1370:
URL: https://github.com/apache/hamilton/issues/1370#issuecomment-3246451294


   Actually, it is *likely to be impossible* to view "what nodes will be 
executed before launching execution when using caching". 
   
   In short, the caching feature needs to "walk the graph". You start at input 
nodes and follow the execution path until the final nodes. At each step, 
caching checks the `code_version` of the current node and the `data_version` of 
its dependencies.
   
   Before executing the DAG, you don't know the `data_version` of all nodes (if 
you did, you wouldn't need to execute the DAG at all!). Before executing the 
DAG, you're able to make 3 types of claims:
   - "Will be retrieved from cache"
   - "Might be retrieved from cache"
   - "Will need to be executed / will not be retrieved from cache"
   
   Visually:
   
   <img width="1435" height="990" alt="Image" 
src="https://github.com/user-attachments/assets/d0cdfa44-bb6c-49f2-845e-a79f803bc312";
 />
   
   The thing is, whenever you hit a "maybe will be retrieved from cache", 
everything downstream of it will necessarily be a "maybe" too. Since what 
changes between runs are the input nodes (i.e., top-level nodes), you will find 
that you almost always have visualizations filled with maybes. Though, the viz 
could be useful is certain specific topologies
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Add a dry run feature [hamilton]

Reply via email to