berkaysynnada commented on PR #7544:
URL: 
https://github.com/apache/arrow-datafusion/pull/7544#issuecomment-1719164499

   > > I'm sorry but I didn't understand exactly what you mean while saying 
derive/propagate.
   > 
   > I didn't explain it in enough detail, and I apologize for that.
   > 
   > `Statistic Derive/Propagate` is a process to compute a whole plan 
statistic. In general, we have the init statistic of table, and we will compute 
parent PlanNode from bottom to up recursively and we will compute fill 
statistic in all PlanNode.
   > 
   > This PR is handle a condition: `In FilterExec::statisticsmethod, if the 
input statistics are None, the analysis is not performed.` So this PR fill 
TypeInfo (such as ScalarValue::Null) into `Statisitc`.
   > 
   > My point is that we can also correct the "Statistic Derive" so that the 
input statistics are not None, but already have TypeInfo, rather than injecting 
TypeInfo directly into FilterExec.
   
   Thanks for the explanation :) There is an 
[issue](https://github.com/apache/arrow-datafusion/issues/7553), I don't know 
if you have found the chance to review it, but in summary, this `statistics` 
method of `FilterExec` needs a refactor. I will move the changes in this PR 
there; therefore I close this PR. 
   
   Actually what you said "if the input statistics are None, the analysis is 
not performed." is not what I intended. The analysis is performed with columns 
having infinite bounds. To do this, filling Statistics with TypeInfo is 
inevitable. 
   
   To reflect what you suggest in practice, I plan to add a `Schema`-like field 
to the `Statistics` struct (to hold TypeInfo) and let each statistics method 
update this schema for the PlanNode's above it. However, since each statistics 
method has access to its own and children's schema, it would be like carrying 
duplicate information, so I gave up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to