Re: [PR] Add Python bindings for accessing ExecutionMetrics [datafusion-python]

via GitHub Thu, 19 Mar 2026 13:47:46 -0700


ShreyeshArangath commented on PR #1381:
URL: 
https://github.com/apache/datafusion-python/pull/1381#issuecomment-4093199310


   Apologies for the delayed response on this one 😅 
   
   > One area I am concerned about is that when we do a display() we do bypass 
all of this mechanism. That is good and bad. The good is that the metrics are 
definitely going to be different between the smaller collection that happens 
when we display because it ends early. The bad is that as a user it's probably 
confusing to see the the data but then be told that we don't have the metrics 
for the data in front of them. What do you think? 
   
   That's a totally fair concern, and I think this is worth addressing. It's 
definitely going to trip people up. I'm still thinking through this, but a 
high-level idea could be that we could possibly have display() also cache a 
plan so users can at least inspect metrics from the display execution (haven't 
looked into what might be required to support this). The metrics would reflect 
the truncated run but timestamps and compute times would still be meaningful. 
WDYT? 
   
    
   For documentation, I've already started a little bit of work in this PR, 
please do lmk what you think (you likely have a lot more context on what a user 
might expect).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add Python bindings for accessing ExecutionMetrics [datafusion-python]

Reply via email to