Hi Hive community,

I'm working on HIVE-29398 <https://issues.apache.org/jira/browse/HIVE-29398> to make 
the Hive metastore more compatible with other projects that use it (e.g., Impala). In 2019, 
HIVE-22311 <https://issues.apache.org/jira/browse/HIVE-22311> (Propagate min/max 
column values from statistics to the optimizer for timestamp type) had introduced a struct 
TimestampColumnStatsData in the thrift definition. It seems that this change to the thrift 
code was not necessary, as the timestamp statistics can be passed via the existing 
LongColumnStatsData as well. Impala actually expects the statistics that way. I had worked 
on a property to switch back to the old behavior.

In the review of the PR <https://github.com/apache/hive/pull/6276>, Krisztian Kasa 
suggested to ask the community, whether it would be possible to undo the change of HIVE-22311 
<https://issues.apache.org/jira/browse/HIVE-22311>. A while ago I prepared a patch to undo 
the changes to the thrift code, while still keeping the benefits of propagating the stats to the 
optimizer, so it is possible. I'm quite new to Hive, so I don't know much about the consequences 
of removing a field from the thrift code. Is it actually advisable to remove the timestampStats 
field 
<https://github.com/apache/hive/blob/c80c7215f032cd49d79e15275520bf55d768a901/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L592>
 from hive_metastore.thrift? Are there other projects that started to use the timestamp stats 
field? In the case we decide to drop the field, those projects would need to use the long field 
instead.

Best regards,
Thomas Rebele

Reply via email to