asolimando commented on PR #23051:
URL: https://github.com/apache/datafusion/pull/23051#issuecomment-4780050319

   Thanks @xudong963 for your feedback.
   
   > 1. Old callers calling migrated nodes can get unknown stats. Many built-in 
operators now implement only `statistics_from_inputs`. But the deprecated 
`partition_statistics` default does not delegate to statistics_from_inputs; it 
just returns `Statistics::new_unknown`. So any remaining direct caller of 
partition_statistics can silently lose useful statistics.
   
   This is correct but it's the same one-way-delegation tradeoff we accepted in 
#21815, it's already documented in the upgrade guide.
   
   > 2. New callers calling old custom nodes can do wasted child traversal.
   >    `StatisticsContext::compute` always computes child stats before calling 
`statistics_from_inputs`. For old third-party plans that only override 
`partition_statistics`, the default `statistics_from_inputs`
   >    ignores those child stats and delegates to `partition_statistics`. So 
the new API may perform extra work, or even surface child-stat errors, that the 
old implementation never required.
   
   This is indeed a regression w.r.t. #21815, whose default never walked 
children. I considered several options but I finally settled on flipping the 
default `child_stats_requests` to `Skip` to restore the behavior of #21815.
   
   This way, a node that only overrides `partition_statistics` now declares no 
children, so there's no walk, and no risk of surfaced child errors. Nodes that 
read `input_stats` declare the children they use.
   
   Side note: the semver check is technically correct but this is a change just 
between #21815 and this PR, so we don't need to go through a deprecation 
process as nothing was released in the meantime, correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to