[GitHub] [arrow-adbc] lidavidm commented on issue #685: format: add statistics for tables, columns, queries, etc.

via GitHub Thu, 15 Jun 2023 08:22:32 -0700


lidavidm commented on issue #685:
URL: https://github.com/apache/arrow-adbc/issues/685#issuecomment-1593277547


   ## JDBC
   
   
[`getIndexInfo`](https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/DatabaseMetaData.html#getIndexInfo(java.lang.String,java.lang.String,java.lang.String,boolean,boolean))
   - Oriented around database indices
   - Provides row count and ndv
   - Also provides page count and ordering
   - Differentiates between exact/approximate; this can [affect 
performance](https://groups.google.com/g/h2-database/c/Vago-0qkWL4) of the call
   
   => We may want a statistic for "abstract size"? (But the values wouldn't be 
comparable between drivers.)
   => Is ordering useful?
   => Exact/approximate may also be useful to indicate
   
   ## ODBC
   
   
[`SQLStatistics`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlstatistics-function?view=sql-server-ver16)
   
   - Effectively the same as JDBC
   
   ## PostgreSQL
   
   https://www.postgresql.org/docs/current/planner-stats.html and 
https://www.postgresql.org/docs/current/view-pg-stats.html
   
   - Row count (may be out of date), page count
   - null percentage
   - a very odd ndv estimate
   - histograms, most common elements, etc.
   - average column width (~= average size of a row in a column; i.e. an 
estimate of string length, etc?)
   
   => How should we define ndv?
   => Do we want to be able to map through "most common elements" etc. or 
should we leave that alone? (Probably leave it alone)
   => We may want to define a column width statistic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-adbc] lidavidm commented on issue #685: format: add statistics for tables, columns, queries, etc.

Reply via email to