ameyc commented on PR #12186: URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2344426743
Taking a second look at implementing this the using a global hashmap with pointers to Arc<ExecuptionPlan> and solution is somewhat more hacky and error prone and as developers trying to build on top of DataFusion makes our experience feel pretty brittle. While we are interested in using the node_id for stateful checkpointing, it does make sense for this to be on an ExecutionPlan since it is a node on PhysicalPlan graph. > In some use cases, one wants an ID for every node the plan tree (e.g. for display/UI purposes). In others, what is actually necessary is an ID per stream (any kind of stateful work). Streams can easily, derive their id "{node_id}_{stream_id}" > Having a default None value for IDs is potentially problematic (apparently we ran into bugs caused by this after initially trying such an approach). Presumably for any user interested in using the "node_id", they'd make their operators implement "with_node_id". > One may want different types for the ID in different use cases (usize may not be appropriate for all). For additional types of node_ids, I suppose one could maintain them in a HashMap<usize, T> where usize is the node_id and not a ptr to Arc<dyn ExecutionPlan> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org