zilto commented on issue #1370: URL: https://github.com/apache/hamilton/issues/1370#issuecomment-3246451294
Actually, it is *likely to be impossible* to view "what nodes will be executed before launching execution when using caching". In short, the caching feature needs to "walk the graph". You start at input nodes and follow the execution path until the final nodes. At each step, caching checks the `code_version` of the current node and the `data_version` of its dependencies. Before executing the DAG, you don't know the `data_version` of all nodes (if you did, you wouldn't need to execute the DAG at all!). Before executing the DAG, you're able to make 3 types of claims: - "Will be retrieved from cache" - "Might be retrieved from cache" - "Will need to be executed / will not be retrieved from cache" Visually: <img width="1435" height="990" alt="Image" src="https://github.com/user-attachments/assets/d0cdfa44-bb6c-49f2-845e-a79f803bc312" /> The thing is, whenever you hit a "maybe will be retrieved from cache", everything downstream of it will necessarily be a "maybe" too. Since what changes between runs are the input nodes (i.e., top-level nodes), you will find that you almost always have visualizations filled with maybes. Though, the viz could be useful is certain specific topologies -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
