Thanks zouxxy for starting this discussion. The design looks good to me overall.
Left some comments: > Statistics in snapshot (global stats + col stats) with some prunning > strategies > stats calculated in real-time from splits (only including > numRows and totalSize) But also need to re-calculate partition col stats? > FileStoreCommit.writeStats FileStoreCommit.commitStatistics > Long snapshotId in Stats Why nullable? Should it be long? Best, Jingsong On Fri, Jan 12, 2024 at 3:01 PM zouxxyy <[email protected]> wrote: > > Hi, Paimon Devs, I’d like to start a discussion about PIP-14[1]. > > Table statistics describe the data distribution characteristics of a table. > Common statistics include the number of rows, table size, column statistics > and more. > They are very important for DBMS, especially when executing query plans and > optimizing query performance. > This PIP further expand on the existing statistics of Paimon to support more > statistical information. > > Look forward to your question and suggestions. > > Best, zouxxyy > > [1] https://cwiki.apache.org/confluence/x/HYokEQ >
