gianm opened a new issue #10044:
URL: https://github.com/apache/druid/issues/10044


   Originally reported in #10011.
   
   The issue is that a query like this:
   
   ```sql
   SELECT
     dim1,
     dim3,  
     COUNT(*)
   FROM druid.foo
   GROUP BY dim1, dim3
   HAVING dim3 = 'b'
   ```
   
   Will get rewritten into this:
   
   ```sql
   SELECT
     dim1,
     dim3,  
     COUNT(*)
   FROM druid.foo
   WHERE dim3 = 'b'
   GROUP BY dim1, dim3
   ```
   
   This is a valid rewrite if dim3 is singly-valued, and is a good optimization 
because WHERE is more efficient than HAVING. But it's not valid if dim3 is 
multi-value. We'll need to only apply this rewrite in cases where dim3 is 
singly-valued.
   
   This will probably involve threading multi-valuedness type information 
through the SQL type system, which we aren't currently doing AFAIK.
   
   Here are two tests that illustrate the intended behavior:
   
   ```java
     @Test // This test passes today
     public void testHavingOnColumnFromGroupBy() throws Exception
     {
       // Cannot vectorize due to grouping on a multi-value column.
       cannotVectorize();
   
       // dim1 is singly-valued, dim3 is multi-valued
       testQuery(
           "SELECT\n"
           + "  dim1,\n"
           + "  dim3,\n"
           + "  COUNT(*)\n"
           + "FROM druid.foo\n"
           + "GROUP BY dim1, dim3\n"
           + "HAVING dim1 = '10.1'",
           ImmutableList.of(
               GroupByQuery.builder()
                           .setDataSource(CalciteTests.DATASOURCE1)
                           .setInterval(querySegmentSpec(Filtration.eternity()))
                           .setGranularity(Granularities.ALL)
                           .setDimFilter(selector("dim1", "10.1", null))
                           .setDimensions(
                               dimensions(
                                   new DefaultDimensionSpec("dim1", "d0"),
                                   new DefaultDimensionSpec("dim3", "d1")
                               )
                           )
                           .setAggregatorSpecs(aggregators(new 
CountAggregatorFactory("a0")))
                           .setContext(QUERY_CONTEXT_DEFAULT)
                           .build()
           ),
           ImmutableList.of(
               new Object[]{"10.1", "b", 1L},
               new Object[]{"10.1", "c", 1L}
           )
       );
     }
   
     @Test // This test fails today
     public void testHavingOnMultiValueColumnFromGroupBy() throws Exception
     {
       // Cannot vectorize due to grouping on a multi-value column.
       cannotVectorize();
   
       // dim1 is singly-valued, dim3 is multi-valued
       testQuery(
           "SELECT\n"
           + "  dim1,\n"
           + "  dim3,\n"
           + "  COUNT(*)\n"
           + "FROM druid.foo\n"
           + "GROUP BY dim1, dim3\n"
           + "HAVING dim3 = 'b'",
           ImmutableList.of(
               GroupByQuery.builder()
                           .setDataSource(CalciteTests.DATASOURCE1)
                           .setInterval(querySegmentSpec(Filtration.eternity()))
                           .setGranularity(Granularities.ALL)
                           .setDimensions(
                               dimensions(
                                   new DefaultDimensionSpec("dim1", "d0"),
                                   new DefaultDimensionSpec("dim3", "d1")
                               )
                           )
                           .setHavingSpec(having(selector("d1", "b", null)))
                           .setAggregatorSpecs(aggregators(new 
CountAggregatorFactory("a0")))
                           .setContext(QUERY_CONTEXT_DEFAULT)
                           .build()
           ),
           ImmutableList.of(
               new Object[]{NULL_STRING, "b", 1L},
               new Object[]{"10.1", "b", 1L}
           )
       );
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to