Hi Zoltán,
Thanks for the response. For example, let’s say I have a table stored as
Parquet with columns A, B, C of types BIGINT, VARCHAR, and TIMESTAMP.
Assuming the table has lots of data, after I run analyze on it (computing
column stats for all columns), the data_size stats value for some of the
columns is still null (i.e. showing the table stats of the table, columns B
and C have the expected value for the overall data size, while column A Is
null).

Do you know what that depends on and whether I can infer the overall data
size of the columns that don’t have a value computed after stats are
collected?

Thanks,
James

On Sat, Jul 27, 2019 at 4:53 AM Zoltán Haindrich <k...@rxd.hu> wrote:

> Hey James!
>
>
> Because column stats are about all values of the column; it's not entirely
> clear to me what "column stats row" refers to.
> Could you give some example? It would help a lot
>
> cheers,
> Zoltan
>
> On July 26, 2019 1:18:17 AM GMT+02:00, James Taylor <
> jamestay...@apache.org> wrote:
> >Hello,
> >Why do some column stats rows have a null data_size value after I
> >analyze
> >the table computing column statistics? Is it possible to infer the
> >approximate data size from the other stats column values?
> >Thanks,
> >James
>
> --
> Zoltán Haindrich

Reply via email to