[GitHub] [druid] gianm commented on issue #9414: SQL: severe performance degradation if multi-valued dimension gets combined with a LOOKUP and NVL function

GitBox Wed, 26 Feb 2020 14:41:29 -0800

gianm commented on issue #9414: SQL: severe performance degradation if 
multi-valued dimension gets combined with a LOOKUP and NVL function
URL: https://github.com/apache/druid/issues/9414#issuecomment-591681808
 
 
   There is an important optimization that gets applied for string expressions 
that read just one column, in which expression evaluation is deferred until 
after the segment scan. I'm not sure why, but in your case, it's being applied 
for `LOOKUP(...)` and not for `NULLIF(LOOKUP(...), ...)`.
   
   It probably has to do with the fact that the column is referenced twice and 
the fact that the input is multi-value (this SQL expression will become 
`case_searched((lookup("adTypeIdPubRequest",'adtype_by_adtypeid') == 
'fallback'),null,lookup("adTypeIdPubRequest",'adtype_by_adtypeid'))` as a 
native expression). The logic for performing the optimization might be getting 
confused.
   
   Try this workaround:
   
   ```
   SELECT
      NULLIF(LOOKUP(adTypeIdPubRequest, 'adtype_by_adtypeid'), 'fallback') AS 
adFormat,
      SUM("count") AS cnt
   FROM "supply-activities" 
   WHERE  (__time >= timestamp'2020-01-20 00:00:00' AND __time < timestamp 
'2020-01-20 01:00:00')
   GROUP BY LOOKUP(adTypeIdPubRequest, 'adtype_by_adtypeid')
   ORDER BY 2
   LIMIT 10
   ```
   
   It's not exactly an identical query, but it will run faster, and might be 
close enough to what you want.
   
   Btw, I think that this was always an issue for expressions on multi-value 
columns as long as they've worked at all (they used to just always return 
null). So I think it's not a regression.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on issue #9414: SQL: severe performance degradation if multi-valued dimension gets combined with a LOOKUP and NVL function

Reply via email to