Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

via GitHub Thu, 20 Feb 2025 13:17:44 -0800


edmondop commented on PR #14699:
URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2672704365


   > > I issued some comments (will still go through a second round of review)
   > > One side question, this design is different from the one of Postgres 
https://www.postgresql.org/docs/current/view-pg-stats.html
   > > that uses histograms. Is using histograms more suitable for OLTP 
workloads than OLAP/ I really don't know much, but I was curious about this 
choice
   > 
   > What I see is those PG stats are table and column statistics at the user 
level. What we're building here is a foundational statistics infrastructure 
that serves as a basis for other statistical concepts. It is designed to 
satisfy various computational requirements and parameters (and they are 
extensible). It is built to be robust and error-prone. If you prefer displaying 
a stat at some point as histogram, you can easily convert these new 
distributions into histograms using a few converter functions.
   
   I guess what I am saying (but I am not really sure about it) is that maybe 
postgres (and Oracle 
https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/histograms.html)
 use histograms because most data doesn't follow a "known probability 
distribution", but I am not sure honestly. It's just "stuff that I was working 
on recently"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

Reply via email to