edmondop commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2672704365
> > I issued some comments (will still go through a second round of review) > > One side question, this design is different from the one of Postgres https://www.postgresql.org/docs/current/view-pg-stats.html > > that uses histograms. Is using histograms more suitable for OLTP workloads than OLAP/ I really don't know much, but I was curious about this choice > > What I see is those PG stats are table and column statistics at the user level. What we're building here is a foundational statistics infrastructure that serves as a basis for other statistical concepts. It is designed to satisfy various computational requirements and parameters (and they are extensible). It is built to be robust and error-prone. If you prefer displaying a stat at some point as histogram, you can easily convert these new distributions into histograms using a few converter functions. I guess what I am saying (but I am not really sure about it) is that maybe postgres (and Oracle https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/histograms.html) use histograms because most data doesn't follow a "known probability distribution", but I am not sure honestly. It's just "stuff that I was working on recently" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org