Hi Zoltán, Thanks for the response. For example, let’s say I have a table stored as Parquet with columns A, B, C of types BIGINT, VARCHAR, and TIMESTAMP. Assuming the table has lots of data, after I run analyze on it (computing column stats for all columns), the data_size stats value for some of the columns is still null (i.e. showing the table stats of the table, columns B and C have the expected value for the overall data size, while column A Is null).
Do you know what that depends on and whether I can infer the overall data size of the columns that don’t have a value computed after stats are collected? Thanks, James On Sat, Jul 27, 2019 at 4:53 AM Zoltán Haindrich <k...@rxd.hu> wrote: > Hey James! > > > Because column stats are about all values of the column; it's not entirely > clear to me what "column stats row" refers to. > Could you give some example? It would help a lot > > cheers, > Zoltan > > On July 26, 2019 1:18:17 AM GMT+02:00, James Taylor < > jamestay...@apache.org> wrote: > >Hello, > >Why do some column stats rows have a null data_size value after I > >analyze > >the table computing column statistics? Is it possible to infer the > >approximate data size from the other stats column values? > >Thanks, > >James > > -- > Zoltán Haindrich