sascha-coenen opened a new issue #9417: SQL: improve division behaviour
URL: https://github.com/apache/druid/issues/9417
 
 
   ### Description
   
   Most databases implement the following behaviour to make it easy to 
formulate divisions and receive an expected outcome.
   
   Let me go over two or three differences between how Druid works relative to 
other SQL dialects and how this is leading to extremely long expressions even 
for simple cases of needing to compute a/b:
   
   `SELECT 1 / 2`
   in Vertica this evaluates to 0.5
   in Druid it evaluates to 0
   
   `SELECT 1 / 0`
   fails in Vertica and Druid
   
   `SELECT 1 / null`
   evaluates to null in Vertica
   in Druid it throws an exception.
   
   With the above behaviour, it is possible to formulate divisions that are 
null safe and also evaluate to correct results in a consice way:
   
   SUM(nominator) / NVL(SUM(denominator))
   
   In Druid, it is cumbersome that divisions are by default integer divisions 
and furthermore, it takes CASE statements to make a query robust against 
division-by-zero cases.
   
   So the simple intention a/b turns into:
   
   `SELECT CASE(b = 0.0) THEN NULL ELSE ( a / CAST(b AS DOUBLE) END`
   
   This becomes much worse if the denominator itself is a complex expression. 
In that case, it needs to be repeated in the CASE test redundantly. Our 
expressions for normal real-world conversion rates look like this:
   
   ```
   CASE ( SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 
'f', 'm')) ) WHEN 0.0 THEN 0.0 ELSE SUM(validBidPriceSum) FILTER(WHERE 
activityTypeId IN('o', 'p', 'f', 'm')) / CAST(SUM(validInteractionCount) 
FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) AS DOUBLE) END
   ```
   
   This could be simplified to the following, if the behaviour of division and 
null handling was as with other SQL database systems:
   
   ```
   SUM(validBidPriceSum) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm') / 
NVL(SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 
'm'))
   ```
   
   Ideally, one would just change the behaviour to be compatible with the above 
but as I anticipate that this would be rejected due to breaking backwards 
compatibility, I would alternatively propose to have a DIV() function that 
deals with division-by-zero and type conversion in a concise way
   
   ```
   DIV(nominator, denominator, null-fallback)
   
   e.g
   
   DIV(SUM(validBidPriceSum), SUM(validInteractionCount), 0.0) FILTER(WHERE 
activityTypeId IN('o', 'p', 'f', 'm')
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to