Hi Hive community, I'm working on HIVE-29398 <https://issues.apache.org/jira/browse/HIVE-29398> to make the Hive metastore more compatible with other projects that use it (e.g., Impala). In 2019, HIVE-22311 <https://issues.apache.org/jira/browse/HIVE-22311> (Propagate min/max column values from statistics to the optimizer for timestamp type) had introduced a struct TimestampColumnStatsData in the thrift definition. It seems that this change to the thrift code was not necessary, as the timestamp statistics can be passed via the existing LongColumnStatsData as well. Impala actually expects the statistics that way. I had worked on a property to switch back to the old behavior.
In the review of the PR <https://github.com/apache/hive/pull/6276>, Krisztian Kasa suggested to ask the community, whether it would be possible to undo the change of HIVE-22311 <https://issues.apache.org/jira/browse/HIVE-22311>. A while ago I prepared a patch to undo the changes to the thrift code, while still keeping the benefits of propagating the stats to the optimizer, so it is possible. I'm quite new to Hive, so I don't know much about the consequences of removing a field from the thrift code. Is it actually advisable to remove the timestampStats field <https://github.com/apache/hive/blob/c80c7215f032cd49d79e15275520bf55d768a901/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L592> from hive_metastore.thrift? Are there other projects that started to use the timestamp stats field? In the case we decide to drop the field, those projects would need to use the long field instead. Best regards, Thomas Rebele
