sascha-coenen opened a new issue #9417: SQL: improve division behaviour URL: https://github.com/apache/druid/issues/9417 ### Description Most databases implement the following behaviour to make it easy to formulate divisions and receive an expected outcome. Let me go over two or three differences between how Druid works relative to other SQL dialects and how this is leading to extremely long expressions even for simple cases of needing to compute a/b: `SELECT 1 / 2` in Vertica this evaluates to 0.5 in Druid it evaluates to 0 `SELECT 1 / 0` fails in Vertica and Druid `SELECT 1 / null` evaluates to null in Vertica in Druid it throws an exception. With the above behaviour, it is possible to formulate divisions that are null safe and also evaluate to correct results in a consice way: SUM(nominator) / NVL(SUM(denominator)) In Druid, it is cumbersome that divisions are by default integer divisions and furthermore, it takes CASE statements to make a query robust against division-by-zero cases. So the simple intention a/b turns into: `SELECT CASE(b = 0.0) THEN NULL ELSE ( a / CAST(b AS DOUBLE) END` This becomes much worse if the denominator itself is a complex expression. In that case, it needs to be repeated in the CASE test redundantly. Our expressions for normal real-world conversion rates look like this: ``` CASE ( SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) ) WHEN 0.0 THEN 0.0 ELSE SUM(validBidPriceSum) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) / CAST(SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) AS DOUBLE) END ``` This could be simplified to the following, if the behaviour of division and null handling was as with other SQL database systems: ``` SUM(validBidPriceSum) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm') / NVL(SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) ``` Ideally, one would just change the behaviour to be compatible with the above but as I anticipate that this would be rejected due to breaking backwards compatibility, I would alternatively propose to have a DIV() function that deals with division-by-zero and type conversion in a concise way ``` DIV(nominator, denominator, null-fallback) e.g DIV(SUM(validBidPriceSum), SUM(validInteractionCount), 0.0) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm') ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
