Hello everyone, recently I have been working on materialized views using 
Calcite, and in our use case, there are a lot of queries involving 
CountDistinct.
And generally, to support rewriting for Count Distinct,  we will always use 
bitmap. However, recently, I have developed a new capability in Calcite that 
allows rewriting of Count Distinct queries to read from the materialized view 
table without the need for bitmap, as long as the Count Distinct is querying 
the group by columns of the materialized view.

For example, let's assume we have the following materialized view: 

```sql
CREATE MATERIALIZED VIEW test_mv AS
SELECT
    c1, c2, c3, sum(c4)
FROM
   t
GROUP BY
    c1, c2, c3
```

After the materialized view created, the following query arrives:

```sql

select count(distinct c2) from t group by c2, c3

```

With the capability I've developed, the above query can be rewritten as:

```sql

select count(distinct c2) from test_mv group by c2, c3

```

The rewrite mentioned above, compared to calculating COUNT DISTINCT directly on 
the original table, will significantly reduce the query time because the 
materialized view contains a reduced amount of data.

Is anyone interested in this? I can initiate a Pull Request. :)







Reply via email to