I haven't had a chance to review but saw that Elastic has the same issue with aggregations.
https://github.com/elastic/elasticsearch/issues/35745 Kevin Risden On Wed, Nov 28, 2018, 20:46 Andrei Sereda <[email protected] wrote: > Greetings , > We have discovered an issue with ES aggregations when grouping on > non-textual fields (date, long). Currently the following sql fails because > for missing value > < > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13 > > > we inject __MISSING__ sentinel which is not date / number parseable (it > can’t be null either) : > > select max(amount), date from orders group by date -- special ES type > > The solution is to make sentinel type specific : > > 1. Integer.MIN_VALUE for integers > 2. 9999-12-31 for dates etc. > > For low cardinality types like boolean, byte, short I’m not sure what to do > since there is high probability of collision between missing field and > actual value (eg. what value to choose for missing boolean?) : > > select max(amount), isActive from orders group by isActive -- boolean type > > Let me know if you solved this problem differently before. Composite > aggregations > < > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html > > > (available since 6.1) should help in future. > > PR: https://github.com/apache/calcite/pull/946 > JIRA: https://issues.apache.org/jira/browse/CALCITE-2689 > > Many Thanks in Advance, > Andrei. >
