ShreyeshArangath commented on PR #1381:
URL:
https://github.com/apache/datafusion-python/pull/1381#issuecomment-4093199310
Apologies for the delayed response on this one 😅
> One area I am concerned about is that when we do a display() we do bypass
all of this mechanism. That is good and bad. The good is that the metrics are
definitely going to be different between the smaller collection that happens
when we display because it ends early. The bad is that as a user it's probably
confusing to see the the data but then be told that we don't have the metrics
for the data in front of them. What do you think?
That's a totally fair concern, and I think this is worth addressing. It's
definitely going to trip people up. I'm still thinking through this, but a
high-level idea could be that we could possibly have display() also cache a
plan so users can at least inspect metrics from the display execution (haven't
looked into what might be required to support this). The metrics would reflect
the truncated run but timestamps and compute times would still be meaningful.
WDYT?
For documentation, I've already started a little bit of work in this PR,
please do lmk what you think (you likely have a lot more context on what a user
might expect).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]