ameyc commented on PR #12186:
URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2344426743

   Taking a second look at implementing this the using a global hashmap with 
pointers to Arc<ExecuptionPlan> and solution is somewhat more hacky and error 
prone and as developers trying to build on top of DataFusion makes our 
experience feel pretty brittle.
   
   While we are interested in using the node_id for stateful checkpointing, it 
does make sense for this to be on an ExecutionPlan since it is a node on 
PhysicalPlan graph.
   
   > In some use cases, one wants an ID for every node the plan tree (e.g. for 
display/UI purposes). In others, what is actually necessary is an ID per stream 
(any kind of stateful work).
   
   Streams can easily, derive their id "{node_id}_{stream_id}"
   
   > Having a default None value for IDs is potentially problematic (apparently 
we ran into bugs caused by this after initially trying such an approach).
   
   Presumably for any user interested in using the "node_id", they'd make their 
operators implement "with_node_id".
   
   > One may want different types for the ID in different use cases (usize may 
not be appropriate for all).
   
   For additional types of node_ids, I suppose one could maintain them in a 
HashMap<usize, T> where usize is the node_id and not a ptr to Arc<dyn 
ExecutionPlan>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to