Re: Implicit Casts for Arithmetic Operators

Benedict Elliott Smith Tue, 20 Nov 2018 08:02:36 -0800

FWIW, my meaning of arithmetic in this context extends to any features we have 
already released (such as aggregates, and perhaps other built-in functions) 
that operate on the same domain.  We should be consistent, after all.


Whether or not we need to revisit any existing functionality we can figure out 
after the fact, once we have agreed what our behaviour should be.

I will make this more explicit for the vote, but just to clarify the intention 
so that we are all discussing the same thing.


> On 20 Nov 2018, at 14:18, Ariel Weisberg <[email protected]> wrote:
> 
> Hi,
> 
> +1
> 
> This is a public API so we will be much better off if we get it right the 
> first time.
> 
> Ariel
> 
>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <[email protected]> wrote:
>> 
>> Sounds good to me.
>> 
>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <[email protected]>
>> wrote:
>> 
>>> So, this thread somewhat petered out.
>>> 
>>> There are still a number of unresolved issues, but to make progress I
>>> wonder if it would first be helpful to have a vote on ensuring we are ANSI
>>> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
>>> since we will hopefully minimise surprise to operators this way.
>>> 
>>> If people largely agree, I will call a vote, and we can pick up a couple
>>> of more focused discussions afterwards on how we interpret the leeway it
>>> gives.
>>> 
>>> 
>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <[email protected]> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> From reading the spec. Precision is always implementation defined. The
>>> spec specifies scale in several cases, but never precision for any type or
>>> operation (addition/subtraction, multiplication, division).
>>>> 
>>>> So we don't implement anything remotely approaching precision and scale
>>> in CQL when it comes to numbers I think? So we aren't going to follow the
>>> spec for scale. We are already pretty far down that road so I would leave
>>> it alone.
>>>> 
>>>> I don't think the spec is asking for the most approximate type. It's
>>> just saying the result is approximate, and the precision is implementation
>>> defined. We could return either float or double. I think if one of the
>>> operands is a double we should return a double because clearly the schema
>>> thought a double was required to represent that number. I would also be in
>>> favor of returning a double all the time so that people can expect a
>>> consistent type from expressions involving approximate numbers.
>>>> 
>>>> I am a big fan of widening for arithmetic expressions in a database to
>>> avoid having to error on overflow. You can go to the trouble of only
>>> widening the minimum amount, but I think it's simpler if we always widen to
>>> bigint and double. This would be something the spec allows.
>>>> 
>>>> Definitely if we can make overflow not occur we should and the spec
>>> allows that. We should also not return different types for the same operand
>>> types just to work around overflow if we detect we need more precision.
>>>> 
>>>> Ariel
>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
>>>>> out (and Mike for getting some empirical examples).
>>>>> 
>>>>> We still have to decide on the approximate data type to return; right
>>>>> now, we have float+bigint=double, but float+int=float.  I think this is
>>>>> fairly inconsistent, and either the approximate type should always win,
>>>>> or we should always upgrade to double for mixed operands.
>>>>> 
>>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>>> decimal+double=decimal
>>>>> 
>>>>> If we’re going to go with an approximate operand implying an
>>> approximate
>>>>> result, I think we should do it consistently (and consistent with the
>>>>> SQL92 spec), and have the type of the approximate operand always be the
>>>>> return type.
>>>>> 
>>>>> This would still leave a decision for float+double, though.  The most
>>>>> consistent behaviour with that stated above would be to always take the
>>>>> most approximate type to return (i.e. float), but this would seem to me
>>>>> to be fairly unexpected for the user.
>>>>> 
>>>>> 
>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <[email protected]> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I agree with what's been said about expectations regarding expressions
>>> involving floating point numbers. I think that if one of the inputs is
>>> approximate then the result should be approximate.
>>>>>> 
>>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>>> follow dogmatically necessarily.
>>>>>> 
>>>>>> From the SQL 92 spec regarding assignment
>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>>> "
>>>>>>      Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>>>>      FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>> mutually
>>>>>>      comparable and mutually assignable. If an assignment would
>>> result
>>>>>>      in a loss of the most significant digits, an exception condition
>>>>>>      is raised. If least significant digits are lost, implementation-
>>>>>>      defined rounding or truncating occurs with no exception
>>> condition
>>>>>>      being raised. The rules for arithmetic are generally governed by
>>>>>>      Subclause 6.12, "<numeric value expression>".
>>>>>> "
>>>>>> 
>>>>>> Section 6.12 numeric value expressions:
>>>>>> "
>>>>>>      1) If the data type of both operands of a dyadic arithmetic
>>> opera-
>>>>>>         tor is exact numeric, then the data type of the result is
>>> exact
>>>>>>         numeric, with precision and scale determined as follows:
>>>>>> ...
>>>>>>      2) If the data type of either operand of a dyadic arithmetic op-
>>>>>>         erator is approximate numeric, then the data type of the re-
>>>>>>         sult is approximate numeric. The precision of the result is
>>>>>>         implementation-defined.
>>>>>> "
>>>>>> 
>>>>>> And this makes sense to me. I think we should only return an exact
>>> result if both of the inputs are exact.
>>>>>> 
>>>>>> I think we might want to look closely at the SQL spec and especially
>>> when the spec requires an error to be generated. Those are sometimes in the
>>> spec to prevent subtle paths to wrong answers. Any time we deviate from the
>>> spec we should be asking why is it in the spec and why are we deviating.
>>>>>> 
>>>>>> Another issue besides overflow handling is how we determine precision
>>> and scale for expressions involving two exact types.
>>>>>> 
>>>>>> Ariel
>>>>>> 
>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
>>> which is
>>>>>>> returning just about any type depending on the order of operators.
>>>>>>> Considering it actually mentions in the docs that using
>>> numeric/decimal is
>>>>>>> slow and also multiple times that floating points are inexact. So
>>> doing
>>>>>>> some math with Postgres (9.6.5):
>>>>>>> 
>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>> precision 2147483647
>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>> 2147483647
>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>> 2147483647
>>>>>>> 
>>>>>>> With + - we can get the same amount of mixture of returned types.
>>> There's
>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>> floating-point math indicates inexactness and has errors and whoever
>>> mixes
>>>>>>> up two different types should understand that. If one didn't want
>>> exact
>>>>>>> numeric type, why would the server return such? The floating point
>>> value
>>>>>>> itself could be wrong already before the calculation - trying to say
>>> we do
>>>>>>> it lossless is just wrong.
>>>>>>> 
>>>>>>> Fun with 2.65:
>>>>>>> 
>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>> 
>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>> 
>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>> 
>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>> 
>>>>>>> So as we're going to have silly values in any case, why pretend
>>> something
>>>>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
>>> in this
>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>>>>> importantly, I would definitely want the exact same type returned
>>> each time
>>>>>>> I do a calculation.
>>>>>>> 
>>>>>>> - Micke
>>>>>>> 
>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>>>>> should implement lossless casts by default?  Does anyone have
>>> anything more
>>>>>>>> to add?
>>>>>>>> 
>>>>>>>> Looking at the emails, everyone who participated and expressed a
>>>>>>>> preference was in favour of the “Postgres approach” of upcasting to
>>> decimal
>>>>>>>> for mixed float/int operands?
>>>>>>>> 
>>>>>>>> I’d like to get a clear-cut decision on this, so we know what we’re
>>> doing
>>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision on
>>> Ariel’s
>>>>>>>> concerns about overflow, which I think are also pressing -
>>> particularly for
>>>>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>>>>> integer type operations, but an approach for these will probably
>>> fall out
>>>>>>>> of any decision on overflow.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I think you're conflating two things here. There's the loss
>>> resulting
>>>>>>>> from
>>>>>>>>> using some operators, and loss involved in casting. Dividing an
>>> integer
>>>>>>>> by
>>>>>>>>> another integer to obtain an integer result can result in loss, but
>>>>>>>> there's
>>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
>>> integer
>>>>>>>>> to a float can also result in loss. So dividing an integer by a
>>> float,
>>>>>>>> for
>>>>>>>>> example, with an implicit cast has an additional avenue for loss:
>>> the
>>>>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>>>>> believe
>>>>>>>>> this discussion so far has been about the latter, not the loss from
>>> the
>>>>>>>>> operations themselves.
>>>>>>>>> 
>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I would like to try to clarify things a bit to help people to
>>> understand
>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>> 
>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not only
>>> at
>>>>>>>> the
>>>>>>>>>> operation level.
>>>>>>>>>> 
>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it, you
>>> will
>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>> 
>>>>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>>>>> Using *decimals
>>>>>>>>>> *during operations will mitigate the problem but will not remove
>>> it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
>>> mistaken
>>>>>>>> in
>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
>>> MS SQL
>>>>>>>>>> server and Oracle do. So all thoses databases will lose precision
>>> if you
>>>>>>>>>> are not carefull.
>>>>>>>>>> 
>>>>>>>>>> If you truly need precision you can have it by using exact numeric
>>> types
>>>>>>>>>> for your data types. Of course it has a cost on performance,
>>> memory and
>>>>>>>>>> disk usage.
>>>>>>>>>> 
>>>>>>>>>> The advantage of the current approach is that it give you the
>>> choice.
>>>>>>>> It is
>>>>>>>>>> up to you to decide what you need for your application. It is also
>>> in
>>>>>>>> line
>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Muru
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Implicit Casts for Arithmetic Operators

Reply via email to