Thanks zouxxy for starting this discussion.

The design looks good to me overall.

Left some comments:

> Statistics in snapshot (global stats + col stats) with some prunning 
> strategies > stats calculated in real-time from splits (only including 
> numRows and totalSize)

But also need to re-calculate partition col stats?

> FileStoreCommit.writeStats

FileStoreCommit.commitStatistics

> Long snapshotId in Stats

Why nullable? Should it be long?

Best,
Jingsong

On Fri, Jan 12, 2024 at 3:01 PM zouxxyy <[email protected]> wrote:
>
> Hi, Paimon Devs, I’d like to start a discussion about PIP-14[1].
>
> Table statistics describe the data distribution characteristics of a table.
> Common statistics include the number of rows, table size, column statistics 
> and more.
> They are very important for DBMS, especially when executing query plans and 
> optimizing query performance.
> This PIP further expand on the existing statistics of Paimon to support more 
> statistical information.
>
> Look forward to your question and suggestions.
>
> Best, zouxxyy
>
> [1] https://cwiki.apache.org/confluence/x/HYokEQ
>

Reply via email to