Re: Implicit Casts for Arithmetic Operators

Michael Burman Tue, 20 Nov 2018 07:51:34 -0800

Yep, that's a good approach.

  - Micke


On Tue, Nov 20, 2018 at 5:12 PM Ariel Weisberg <[email protected]> wrote:

> Hi,
>
> +1
>
> This is a public API so we will be much better off if we get it right the
> first time.
>
> Ariel
>
> > On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <[email protected]> wrote:
> >
> > Sounds good to me.
> >
> > On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> [email protected]>
> > wrote:
> >
> >> So, this thread somewhat petered out.
> >>
> >> There are still a number of unresolved issues, but to make progress I
> >> wonder if it would first be helpful to have a vote on ensuring we are
> ANSI
> >> SQL 92 compliant for our arithmetic?  This seems like a sensible
> baseline,
> >> since we will hopefully minimise surprise to operators this way.
> >>
> >> If people largely agree, I will call a vote, and we can pick up a couple
> >> of more focused discussions afterwards on how we interpret the leeway it
> >> gives.
> >>
> >>
> >>> On 12 Oct 2018, at 18:10, Ariel Weisberg <[email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> From reading the spec. Precision is always implementation defined. The
> >> spec specifies scale in several cases, but never precision for any type
> or
> >> operation (addition/subtraction, multiplication, division).
> >>>
> >>> So we don't implement anything remotely approaching precision and scale
> >> in CQL when it comes to numbers I think? So we aren't going to follow
> the
> >> spec for scale. We are already pretty far down that road so I would
> leave
> >> it alone.
> >>>
> >>> I don't think the spec is asking for the most approximate type. It's
> >> just saying the result is approximate, and the precision is
> implementation
> >> defined. We could return either float or double. I think if one of the
> >> operands is a double we should return a double because clearly the
> schema
> >> thought a double was required to represent that number. I would also be
> in
> >> favor of returning a double all the time so that people can expect a
> >> consistent type from expressions involving approximate numbers.
> >>>
> >>> I am a big fan of widening for arithmetic expressions in a database to
> >> avoid having to error on overflow. You can go to the trouble of only
> >> widening the minimum amount, but I think it's simpler if we always
> widen to
> >> bigint and double. This would be something the spec allows.
> >>>
> >>> Definitely if we can make overflow not occur we should and the spec
> >> allows that. We should also not return different types for the same
> operand
> >> types just to work around overflow if we detect we need more precision.
> >>>
> >>> Ariel
> >>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
> this
> >>>> out (and Mike for getting some empirical examples).
> >>>>
> >>>> We still have to decide on the approximate data type to return; right
> >>>> now, we have float+bigint=double, but float+int=float.  I think this
> is
> >>>> fairly inconsistent, and either the approximate type should always
> win,
> >>>> or we should always upgrade to double for mixed operands.
> >>>>
> >>>> The quoted spec also suggests that decimal+float=float, and decimal
> >>>> +double=double, whereas we currently have decimal+float=decimal, and
> >>>> decimal+double=decimal
> >>>>
> >>>> If we’re going to go with an approximate operand implying an
> >> approximate
> >>>> result, I think we should do it consistently (and consistent with the
> >>>> SQL92 spec), and have the type of the approximate operand always be
> the
> >>>> return type.
> >>>>
> >>>> This would still leave a decision for float+double, though.  The most
> >>>> consistent behaviour with that stated above would be to always take
> the
> >>>> most approximate type to return (i.e. float), but this would seem to
> me
> >>>> to be fairly unexpected for the user.
> >>>>
> >>>>
> >>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <[email protected]> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I agree with what's been said about expectations regarding
> expressions
> >> involving floating point numbers. I think that if one of the inputs is
> >> approximate then the result should be approximate.
> >>>>>
> >>>>> One thing we could look at for inspiration is the SQL spec. Not to
> >> follow dogmatically necessarily.
> >>>>>
> >>>>> From the SQL 92 spec regarding assignment
> >> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>>>> "
> >>>>>       Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>       FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >> mutually
> >>>>>       comparable and mutually assignable. If an assignment would
> >> result
> >>>>>       in a loss of the most significant digits, an exception
> condition
> >>>>>       is raised. If least significant digits are lost,
> implementation-
> >>>>>       defined rounding or truncating occurs with no exception
> >> condition
> >>>>>       being raised. The rules for arithmetic are generally governed
> by
> >>>>>       Subclause 6.12, "<numeric value expression>".
> >>>>> "
> >>>>>
> >>>>> Section 6.12 numeric value expressions:
> >>>>> "
> >>>>>       1) If the data type of both operands of a dyadic arithmetic
> >> opera-
> >>>>>          tor is exact numeric, then the data type of the result is
> >> exact
> >>>>>          numeric, with precision and scale determined as follows:
> >>>>> ...
> >>>>>       2) If the data type of either operand of a dyadic arithmetic
> op-
> >>>>>          erator is approximate numeric, then the data type of the re-
> >>>>>          sult is approximate numeric. The precision of the result is
> >>>>>          implementation-defined.
> >>>>> "
> >>>>>
> >>>>> And this makes sense to me. I think we should only return an exact
> >> result if both of the inputs are exact.
> >>>>>
> >>>>> I think we might want to look closely at the SQL spec and especially
> >> when the spec requires an error to be generated. Those are sometimes in
> the
> >> spec to prevent subtle paths to wrong answers. Any time we deviate from
> the
> >> spec we should be asking why is it in the spec and why are we deviating.
> >>>>>
> >>>>> Another issue besides overflow handling is how we determine precision
> >> and scale for expressions involving two exact types.
> >>>>>
> >>>>> Ariel
> >>>>>
> >>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm not sure if I would prefer the Postgres way of doing things,
> >> which is
> >>>>>> returning just about any type depending on the order of operators.
> >>>>>> Considering it actually mentions in the docs that using
> >> numeric/decimal is
> >>>>>> slow and also multiple times that floating points are inexact. So
> >> doing
> >>>>>> some math with Postgres (9.6.5):
> >>>>>>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::double precision
> returns double
> >>>>>> precision 2147483647 <(214)%20748-3647>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0 returns numeric
> 2147483647 <(214)%20748-3647>.0
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::real returns
> double
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double precision*1::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double precision*1.0::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>>
> >>>>>> With + - we can get the same amount of mixture of returned types.
> >> There's
> >>>>>> no difference in those calculations, just some casting. To me
> >>>>>> floating-point math indicates inexactness and has errors and whoever
> >> mixes
> >>>>>> up two different types should understand that. If one didn't want
> >> exact
> >>>>>> numeric type, why would the server return such? The floating point
> >> value
> >>>>>> itself could be wrong already before the calculation - trying to say
> >> we do
> >>>>>> it lossless is just wrong.
> >>>>>>
> >>>>>> Fun with 2.65:
> >>>>>>
> >>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) returns numeric 4
> >>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>
> >>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>
> >>>>>> So as we're going to have silly values in any case, why pretend
> >> something
> >>>>>> else? Also, exact calculations are slow if we crunch large amount of
> >>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
> >> in this
> >>>>>> case, but I wish it wasn't used as a benchmark in this case. And
> most
> >>>>>> importantly, I would definitely want the exact same type returned
> >> each time
> >>>>>> I do a calculation.
> >>>>>>
> >>>>>> - Micke
> >>>>>>
> >>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> As far as I can tell we reached a relatively strong consensus that
> we
> >>>>>>> should implement lossless casts by default?  Does anyone have
> >> anything more
> >>>>>>> to add?
> >>>>>>>
> >>>>>>> Looking at the emails, everyone who participated and expressed a
> >>>>>>> preference was in favour of the “Postgres approach” of upcasting to
> >> decimal
> >>>>>>> for mixed float/int operands?
> >>>>>>>
> >>>>>>> I’d like to get a clear-cut decision on this, so we know what we’re
> >> doing
> >>>>>>> for 4.0.  Then hopefully we can move on to a collective decision on
> >> Ariel’s
> >>>>>>> concerns about overflow, which I think are also pressing -
> >> particularly for
> >>>>>>> tinyint and smallint.  This does also impact implicit casts for
> mixed
> >>>>>>> integer type operations, but an approach for these will probably
> >> fall out
> >>>>>>> of any decision on overflow.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >> [email protected]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I think you're conflating two things here. There's the loss
> >> resulting
> >>>>>>> from
> >>>>>>>> using some operators, and loss involved in casting. Dividing an
> >> integer
> >>>>>>> by
> >>>>>>>> another integer to obtain an integer result can result in loss,
> but
> >>>>>>> there's
> >>>>>>>> no implicit casting there and no loss due to casting.  Casting an
> >> integer
> >>>>>>>> to a float can also result in loss. So dividing an integer by a
> >> float,
> >>>>>>> for
> >>>>>>>> example, with an implicit cast has an additional avenue for loss:
> >> the
> >>>>>>>> implicit cast for the operands so that they're of the same type. I
> >>>>>>> believe
> >>>>>>>> this discussion so far has been about the latter, not the loss
> from
> >> the
> >>>>>>>> operations themselves.
> >>>>>>>>
> >>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I would like to try to clarify things a bit to help people to
> >> understand
> >>>>>>>>> the true complexity of the problem.
> >>>>>>>>>
> >>>>>>>>> The *float *and *double *types are inexact numeric types. Not
> only
> >> at
> >>>>>>> the
> >>>>>>>>> operation level.
> >>>>>>>>>
> >>>>>>>>> If you insert 676543.21 in a *float* column and then read it, you
> >> will
> >>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>
> >>>>>>>>> If you want accuracy the only way is to avoid those inexact
> types.
> >>>>>>>>> Using *decimals
> >>>>>>>>> *during operations will mitigate the problem but will not remove
> >> it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> >> mistaken
> >>>>>>> in
> >>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
> >> MS SQL
> >>>>>>>>> server and Oracle do. So all thoses databases will lose precision
> >> if you
> >>>>>>>>> are not carefull.
> >>>>>>>>>
> >>>>>>>>> If you truly need precision you can have it by using exact
> numeric
> >> types
> >>>>>>>>> for your data types. Of course it has a cost on performance,
> >> memory and
> >>>>>>>>> disk usage.
> >>>>>>>>>
> >>>>>>>>> The advantage of the current approach is that it give you the
> >> choice.
> >>>>>>> It is
> >>>>>>>>> up to you to decide what you need for your application. It is
> also
> >> in
> >>>>>>> line
> >>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Muru
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [email protected]
> >>>>> For additional commands, e-mail: [email protected]
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >> --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Implicit Casts for Arithmetic Operators

Reply via email to