Hi Jingsong,

Thanks for your comments, here are some responses:


> But also need to re-calculate partition col stats?

No conclusion yet, will refer to the design of parquet when integrating with 
Spark. It will be added to the PIP at that time.


> FileStoreCommit.commitStatistics

will fix.


> Why nullable? Should it be long?

will fix.


Best,
zouxxyy


On 2024/01/12 07:36:26 Jingsong Li wrote:
> Thanks zouxxy for starting this discussion.
> 
> The design looks good to me overall.
> 
> Left some comments:
> 
> > Statistics in snapshot (global stats + col stats) with some prunning 
> > strategies > stats calculated in real-time from splits (only including 
> > numRows and totalSize)
> 
> But also need to re-calculate partition col stats?
> 
> > FileStoreCommit.writeStats
> 
> FileStoreCommit.commitStatistics
> 
> > Long snapshotId in Stats
> 
> Why nullable? Should it be long?
> 
> Best,
> Jingsong
> 
> On Fri, Jan 12, 2024 at 3:01 PM zouxxyy <[email protected]> wrote:
> >
> > Hi, Paimon Devs, I’d like to start a discussion about PIP-14[1].
> >
> > Table statistics describe the data distribution characteristics of a table.
> > Common statistics include the number of rows, table size, column statistics 
> > and more.
> > They are very important for DBMS, especially when executing query plans and 
> > optimizing query performance.
> > This PIP further expand on the existing statistics of Paimon to support 
> > more statistical information.
> >
> > Look forward to your question and suggestions.
> >
> > Best, zouxxyy
> >
> > [1] https://cwiki.apache.org/confluence/x/HYokEQ
> >
> 

Reply via email to