Re: Implicit Casts for Arithmetic Operators

Sylvain Lebresne Thu, 22 Nov 2018 01:25:15 -0800

I'm not saying "let's not do this no matter what and ever fix technical
debt", nor am I fearing decision.


But I *do* think decisions, technical ones at least, should be fact and
data driven. And I'm not even sure why we're talking of having a vote here.
The Apache Way is *not* meant to be primarily vote-driven, votes are
supposed to be a last resort when, after having debated facts and data, no
consensus can be reached. Can we have the debate on facts and data first?
Please.

At the of the day, I object to: "There are still a number of unresolved
issues, but to make progress I wonder if it would first be helpful to have
a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?". More
specifically, I disagree that such vote is a good starting point. Let's
identify and discuss the unresolved issues first. Let's check precisely
what getting our arithmetic ANSI SQL 92 compliant means and how we can get
it. I do support the idea of making such analysis btw, it would be good
data, but no vote is needed whatsoever to make it. Again, I object to
voting first and doing the analysis 2nd.

--
Sylvain


On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <j...@jonhaddad.com> wrote:

> I can’t agree more. We should be able to make changes in a manner that
> improves the DB In the long term, rather than live with the technical debt
> of arbitrary decisions made by a handful of people.
>
> I also agree that putting a knob in place to let people migrate over is a
> reasonable decision.
>
> Jon
>
> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> bened...@apache.org>
> wrote:
>
> > The goal is simply to agree on a set of well-defined principles for how
> we
> > should behave.  If we don’t like the implications that arise, we’ll have
> > another vote?  A democracy cannot bind itself, so I never understood this
> > fear of a decision.
> >
> > A database also has a thousand toggles.  If we absolutely need to, we can
> > introduce one more.
> >
> > We should be doing this upfront a great deal more often.  Doing it
> > retrospectively sucks, but in my opinion it's a bad reason to bind
> > ourselves to whatever made it in.
> >
> > Do we anywhere define the principles of our current behaviour?  I
> couldn’t
> > find it.
> >
> >
> > > On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebre...@gmail.com> wrote:
> > >
> > > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> > bened...@apache.org>
> > > wrote:
> > >
> > >> FWIW, my meaning of arithmetic in this context extends to any features
> > we
> > >> have already released (such as aggregates, and perhaps other built-in
> > >> functions) that operate on the same domain.  We should be consistent,
> > after
> > >> all.
> > >>
> > >> Whether or not we need to revisit any existing functionality we can
> > figure
> > >> out after the fact, once we have agreed what our behaviour should be.
> > >>
> > >
> > > I'm not sure I correctly understand the process suggested, but I don't
> > > particularly like/agree with what I understand. What I understand is a
> > > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
> > real
> > > evaluation of what that entails (at least I haven't seen one), and that
> > > this vote, if passed, would imply we'd then make any backward
> > incompatible
> > > change necessary to achieve compliance ("my meaning of arithmetic in
> this
> > > context extends to any features we have already released" and "Whether
> or
> > > not we need to revisit any existing functionality we can figure out
> after
> > > the fact, once we have agreed what our behaviour should be").
> > >
> > > This might make sense of a new product, but at our stage that seems
> > > backward to me. I think we owe our users to first make the effort of
> > > identifying what "inconsistencies" our existing arithmetic has[1] and
> > > _then_ consider what options we have to fix those, with their pros and
> > cons
> > > (including how bad they break backward compatibility). And if _then_
> > > getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> > > acceptably so), then sure, that's great.
> > >
> > > [1]: one possibly efficient way to do that could actually be to compare
> > our
> > > arithmetic to ANSI SQL 92. Not that all differences found would imply
> > > inconsistencies/wrongness of our arithmetic, but still, it should be
> > > helpful. And I guess my whole point is that we should that analysis
> > first,
> > > and then maybe decide that being ANSI SQL 92 is a reasonable option,
> not
> > > decide first and live with the consequences no matter what they are.
> > >
> > > --
> > > Sylvain
> > >
> > >
> > >> I will make this more explicit for the vote, but just to clarify the
> > >> intention so that we are all discussing the same thing.
> > >>
> > >>
> > >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adwei...@fastmail.fm>
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> +1
> > >>>
> > >>> This is a public API so we will be much better off if we get it right
> > >> the first time.
> > >>>
> > >>> Ariel
> > >>>
> > >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com>
> > >> wrote:
> > >>>>
> > >>>> Sounds good to me.
> > >>>>
> > >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> > >> bened...@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> So, this thread somewhat petered out.
> > >>>>>
> > >>>>> There are still a number of unresolved issues, but to make
> progress I
> > >>>>> wonder if it would first be helpful to have a vote on ensuring we
> are
> > >> ANSI
> > >>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> > >> baseline,
> > >>>>> since we will hopefully minimise surprise to operators this way.
> > >>>>>
> > >>>>> If people largely agree, I will call a vote, and we can pick up a
> > >> couple
> > >>>>> of more focused discussions afterwards on how we interpret the
> leeway
> > >> it
> > >>>>> gives.
> > >>>>>
> > >>>>>
> > >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> From reading the spec. Precision is always implementation defined.
> > The
> > >>>>> spec specifies scale in several cases, but never precision for any
> > >> type or
> > >>>>> operation (addition/subtraction, multiplication, division).
> > >>>>>>
> > >>>>>> So we don't implement anything remotely approaching precision and
> > >> scale
> > >>>>> in CQL when it comes to numbers I think? So we aren't going to
> follow
> > >> the
> > >>>>> spec for scale. We are already pretty far down that road so I would
> > >> leave
> > >>>>> it alone.
> > >>>>>>
> > >>>>>> I don't think the spec is asking for the most approximate type.
> It's
> > >>>>> just saying the result is approximate, and the precision is
> > >> implementation
> > >>>>> defined. We could return either float or double. I think if one of
> > the
> > >>>>> operands is a double we should return a double because clearly the
> > >> schema
> > >>>>> thought a double was required to represent that number. I would
> also
> > >> be in
> > >>>>> favor of returning a double all the time so that people can expect
> a
> > >>>>> consistent type from expressions involving approximate numbers.
> > >>>>>>
> > >>>>>> I am a big fan of widening for arithmetic expressions in a
> database
> > to
> > >>>>> avoid having to error on overflow. You can go to the trouble of
> only
> > >>>>> widening the minimum amount, but I think it's simpler if we always
> > >> widen to
> > >>>>> bigint and double. This would be something the spec allows.
> > >>>>>>
> > >>>>>> Definitely if we can make overflow not occur we should and the
> spec
> > >>>>> allows that. We should also not return different types for the same
> > >> operand
> > >>>>> types just to work around overflow if we detect we need more
> > precision.
> > >>>>>>
> > >>>>>> Ariel
> > >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> > >>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> digging
> > >> this
> > >>>>>>> out (and Mike for getting some empirical examples).
> > >>>>>>>
> > >>>>>>> We still have to decide on the approximate data type to return;
> > right
> > >>>>>>> now, we have float+bigint=double, but float+int=float.  I think
> > this
> > >> is
> > >>>>>>> fairly inconsistent, and either the approximate type should
> always
> > >> win,
> > >>>>>>> or we should always upgrade to double for mixed operands.
> > >>>>>>>
> > >>>>>>> The quoted spec also suggests that decimal+float=float, and
> decimal
> > >>>>>>> +double=double, whereas we currently have decimal+float=decimal,
> > and
> > >>>>>>> decimal+double=decimal
> > >>>>>>>
> > >>>>>>> If we’re going to go with an approximate operand implying an
> > >>>>> approximate
> > >>>>>>> result, I think we should do it consistently (and consistent with
> > the
> > >>>>>>> SQL92 spec), and have the type of the approximate operand always
> be
> > >> the
> > >>>>>>> return type.
> > >>>>>>>
> > >>>>>>> This would still leave a decision for float+double, though.  The
> > most
> > >>>>>>> consistent behaviour with that stated above would be to always
> take
> > >> the
> > >>>>>>> most approximate type to return (i.e. float), but this would seem
> > to
> > >> me
> > >>>>>>> to be fairly unexpected for the user.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> > wrote:
> > >>>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> I agree with what's been said about expectations regarding
> > >> expressions
> > >>>>> involving floating point numbers. I think that if one of the inputs
> > is
> > >>>>> approximate then the result should be approximate.
> > >>>>>>>>
> > >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
> to
> > >>>>> follow dogmatically necessarily.
> > >>>>>>>>
> > >>>>>>>> From the SQL 92 spec regarding assignment
> > >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
> > 4.6:
> > >>>>>>>> "
> > >>>>>>>>     Values of the data types NUMERIC, DECIMAL, INTEGER,
> SMALLINT,
> > >>>>>>>>     FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> > >>>>> mutually
> > >>>>>>>>     comparable and mutually assignable. If an assignment would
> > >>>>> result
> > >>>>>>>>     in a loss of the most significant digits, an exception
> > >> condition
> > >>>>>>>>     is raised. If least significant digits are lost,
> > >> implementation-
> > >>>>>>>>     defined rounding or truncating occurs with no exception
> > >>>>> condition
> > >>>>>>>>     being raised. The rules for arithmetic are generally
> governed
> > >> by
> > >>>>>>>>     Subclause 6.12, "<numeric value expression>".
> > >>>>>>>> "
> > >>>>>>>>
> > >>>>>>>> Section 6.12 numeric value expressions:
> > >>>>>>>> "
> > >>>>>>>>     1) If the data type of both operands of a dyadic arithmetic
> > >>>>> opera-
> > >>>>>>>>        tor is exact numeric, then the data type of the result is
> > >>>>> exact
> > >>>>>>>>        numeric, with precision and scale determined as follows:
> > >>>>>>>> ...
> > >>>>>>>>     2) If the data type of either operand of a dyadic arithmetic
> > >> op-
> > >>>>>>>>        erator is approximate numeric, then the data type of the
> > re-
> > >>>>>>>>        sult is approximate numeric. The precision of the result
> is
> > >>>>>>>>        implementation-defined.
> > >>>>>>>> "
> > >>>>>>>>
> > >>>>>>>> And this makes sense to me. I think we should only return an
> exact
> > >>>>> result if both of the inputs are exact.
> > >>>>>>>>
> > >>>>>>>> I think we might want to look closely at the SQL spec and
> > especially
> > >>>>> when the spec requires an error to be generated. Those are
> sometimes
> > >> in the
> > >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
> > >> from the
> > >>>>> spec we should be asking why is it in the spec and why are we
> > >> deviating.
> > >>>>>>>>
> > >>>>>>>> Another issue besides overflow handling is how we determine
> > >> precision
> > >>>>> and scale for expressions involving two exact types.
> > >>>>>>>>
> > >>>>>>>> Ariel
> > >>>>>>>>
> > >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> things,
> > >>>>> which is
> > >>>>>>>>> returning just about any type depending on the order of
> > operators.
> > >>>>>>>>> Considering it actually mentions in the docs that using
> > >>>>> numeric/decimal is
> > >>>>>>>>> slow and also multiple times that floating points are inexact.
> So
> > >>>>> doing
> > >>>>>>>>> some math with Postgres (9.6.5):
> > >>>>>>>>>
> > >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> > >>>>>>>>> precision 2147483647
> > >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> > >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> > >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> > >>>>> 2147483647
> > >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> > >>>>> 2147483647
> > >>>>>>>>>
> > >>>>>>>>> With + - we can get the same amount of mixture of returned
> types.
> > >>>>> There's
> > >>>>>>>>> no difference in those calculations, just some casting. To me
> > >>>>>>>>> floating-point math indicates inexactness and has errors and
> > >> whoever
> > >>>>> mixes
> > >>>>>>>>> up two different types should understand that. If one didn't
> want
> > >>>>> exact
> > >>>>>>>>> numeric type, why would the server return such? The floating
> > point
> > >>>>> value
> > >>>>>>>>> itself could be wrong already before the calculation - trying
> to
> > >> say
> > >>>>> we do
> > >>>>>>>>> it lossless is just wrong.
> > >>>>>>>>>
> > >>>>>>>>> Fun with 2.65:
> > >>>>>>>>>
> > >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> > >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> > >>>>>>>>>
> > >>>>>>>>> SELECT round(2.65) returns numeric 4
> > >>>>>>>>> SELECT round(2.65::double precision) returns double 4
> > >>>>>>>>>
> > >>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> > >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> > >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> > >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> > >>>>>>>>>
> > >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> > >>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> > >>>>>>>>>
> > >>>>>>>>> So as we're going to have silly values in any case, why pretend
> > >>>>> something
> > >>>>>>>>> else? Also, exact calculations are slow if we crunch large
> amount
> > >> of
> > >>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> > implemention
> > >>>>> in this
> > >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
> And
> > >> most
> > >>>>>>>>> importantly, I would definitely want the exact same type
> returned
> > >>>>> each time
> > >>>>>>>>> I do a calculation.
> > >>>>>>>>>
> > >>>>>>>>> - Micke
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> > >>>>> bened...@apache.org>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> As far as I can tell we reached a relatively strong consensus
> > >> that we
> > >>>>>>>>>> should implement lossless casts by default?  Does anyone have
> > >>>>> anything more
> > >>>>>>>>>> to add?
> > >>>>>>>>>>
> > >>>>>>>>>> Looking at the emails, everyone who participated and
> expressed a
> > >>>>>>>>>> preference was in favour of the “Postgres approach” of
> upcasting
> > >> to
> > >>>>> decimal
> > >>>>>>>>>> for mixed float/int operands?
> > >>>>>>>>>>
> > >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> > >> we’re
> > >>>>> doing
> > >>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> decision
> > >> on
> > >>>>> Ariel’s
> > >>>>>>>>>> concerns about overflow, which I think are also pressing -
> > >>>>> particularly for
> > >>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
> for
> > >> mixed
> > >>>>>>>>>> integer type operations, but an approach for these will
> probably
> > >>>>> fall out
> > >>>>>>>>>> of any decision on overflow.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> > >>>>> murukesh.moha...@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think you're conflating two things here. There's the loss
> > >>>>> resulting
> > >>>>>>>>>> from
> > >>>>>>>>>>> using some operators, and loss involved in casting. Dividing
> an
> > >>>>> integer
> > >>>>>>>>>> by
> > >>>>>>>>>>> another integer to obtain an integer result can result in
> loss,
> > >> but
> > >>>>>>>>>> there's
> > >>>>>>>>>>> no implicit casting there and no loss due to casting.
> Casting
> > an
> > >>>>> integer
> > >>>>>>>>>>> to a float can also result in loss. So dividing an integer
> by a
> > >>>>> float,
> > >>>>>>>>>> for
> > >>>>>>>>>>> example, with an implicit cast has an additional avenue for
> > loss:
> > >>>>> the
> > >>>>>>>>>>> implicit cast for the operands so that they're of the same
> > type.
> > >> I
> > >>>>>>>>>> believe
> > >>>>>>>>>>> this discussion so far has been about the latter, not the
> loss
> > >> from
> > >>>>> the
> > >>>>>>>>>>> operations themselves.
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> > >>>>> benjamin.le...@datastax.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I would like to try to clarify things a bit to help people
> to
> > >>>>> understand
> > >>>>>>>>>>>> the true complexity of the problem.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
> Not
> > >> only
> > >>>>> at
> > >>>>>>>>>> the
> > >>>>>>>>>>>> operation level.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
> it,
> > >> you
> > >>>>> will
> > >>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> > >> types.
> > >>>>>>>>>>>> Using *decimals
> > >>>>>>>>>>>> *during operations will mitigate the problem but will not
> > remove
> > >>>>> it.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
> not
> > >>>>> mistaken
> > >>>>>>>>>> in
> > >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
> > >> what
> > >>>>> MS SQL
> > >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> > >> precision
> > >>>>> if you
> > >>>>>>>>>>>> are not carefull.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you truly need precision you can have it by using exact
> > >> numeric
> > >>>>> types
> > >>>>>>>>>>>> for your data types. Of course it has a cost on performance,
> > >>>>> memory and
> > >>>>>>>>>>>> disk usage.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The advantage of the current approach is that it give you
> the
> > >>>>> choice.
> > >>>>>>>>>> It is
> > >>>>>>>>>>>> up to you to decide what you need for your application. It
> is
> > >> also
> > >>>>> in
> > >>>>>>>>>> line
> > >>>>>>>>>>>> with the way CQL behave everywhere else.
> > >>>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>>
> > >>>>>>>>>>> Muru
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>>>>>>>> For additional commands, e-mail:
> dev-h...@cassandra.apache.org
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>>>
> > >>>>> --
> > >>>> Jon Haddad
> > >>>> http://www.rustyrazorblade.com
> > >>>> twitter: rustyrazorblade
> > >>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >>
> >
> > --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Implicit Casts for Arithmetic Operators

Reply via email to