Re: Implicit Casts for Arithmetic Operators

Ariel Weisberg Fri, 12 Oct 2018 09:23:40 -0700

Hi,

I agree with what's been said about expectations regarding expressions 
involving floating point numbers. I think that if one of the inputs is 
approximate then the result should be approximate.


One thing we could look at for inspiration is the SQL spec. Not to follow 
dogmatically necessarily.

>From the SQL 92 spec regarding assignment 
>http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
"
         Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
         FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
         comparable and mutually assignable. If an assignment would result
         in a loss of the most significant digits, an exception condition
         is raised. If least significant digits are lost, implementation-
         defined rounding or truncating occurs with no exception condition
         being raised. The rules for arithmetic are generally governed by
         Subclause 6.12, "<numeric value expression>".
"

Section 6.12 numeric value expressions:
"
         1) If the data type of both operands of a dyadic arithmetic opera-
            tor is exact numeric, then the data type of the result is exact
            numeric, with precision and scale determined as follows:
...
         2) If the data type of either operand of a dyadic arithmetic op-
            erator is approximate numeric, then the data type of the re-
            sult is approximate numeric. The precision of the result is
            implementation-defined.
"

And this makes sense to me. I think we should only return an exact result if 
both of the inputs are exact.

I think we might want to look closely at the SQL spec and especially when the 
spec requires an error to be generated. Those are sometimes in the spec to 
prevent subtle paths to wrong answers. Any time we deviate from the spec we 
should be asking why is it in the spec and why are we deviating.

Another issue besides overflow handling is how we determine precision and scale 
for expressions involving two exact types.

Ariel

On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> Hi,
> 
> I'm not sure if I would prefer the Postgres way of doing things, which is
> returning just about any type depending on the order of operators.
> Considering it actually mentions in the docs that using numeric/decimal is
> slow and also multiple times that floating points are inexact. So doing
> some math with Postgres (9.6.5):
> 
> SELECT 2147483647::bigint*1.0::double precision returns double
> precision 2147483647
> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> SELECT 2147483647::bigint*1.0::real returns double
> SELECT 2147483647::double precision*1::bigint returns double 2147483647
> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
> 
> With + - we can get the same amount of mixture of returned types. There's
> no difference in those calculations, just some casting. To me
> floating-point math indicates inexactness and has errors and whoever mixes
> up two different types should understand that. If one didn't want exact
> numeric type, why would the server return such? The floating point value
> itself could be wrong already before the calculation - trying to say we do
> it lossless is just wrong.
> 
> Fun with 2.65:
> 
> SELECT 2.65::real * 1::int returns double 2.65000009536743
> SELECT 2.65::double precision * 1::int returns double 2.65
> 
> SELECT round(2.65) returns numeric 4
> SELECT round(2.65::double precision) returns double 4
> 
> SELECT 2.65 * 1 returns double 2.65
> SELECT 2.65 * 1::bigint returns numeric 2.65
> SELECT 2.65 * 1.0 returns numeric 2.650
> SELECT 2.65 * 1.0::double precision returns double 2.65
> 
> SELECT round(2.65) * 1 returns numeric 3
> SELECT round(2.65) * round(1) returns double 3
> 
> So as we're going to have silly values in any case, why pretend something
> else? Also, exact calculations are slow if we crunch large amount of
> numbers. I guess I slightly deviated towards Postgres' implemention in this
> case, but I wish it wasn't used as a benchmark in this case. And most
> importantly, I would definitely want the exact same type returned each time
> I do a calculation.
> 
>   - Micke
> 
> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <bened...@apache.org>
> wrote:
> 
> > As far as I can tell we reached a relatively strong consensus that we
> > should implement lossless casts by default?  Does anyone have anything more
> > to add?
> >
> > Looking at the emails, everyone who participated and expressed a
> > preference was in favour of the “Postgres approach” of upcasting to decimal
> > for mixed float/int operands?
> >
> > I’d like to get a clear-cut decision on this, so we know what we’re doing
> > for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
> > concerns about overflow, which I think are also pressing - particularly for
> > tinyint and smallint.  This does also impact implicit casts for mixed
> > integer type operations, but an approach for these will probably fall out
> > of any decision on overflow.
> >
> >
> >
> >
> >
> >
> > > On 3 Oct 2018, at 11:38, Murukesh Mohanan <murukesh.moha...@gmail.com>
> > wrote:
> > >
> > > I think you're conflating two things here. There's the loss resulting
> > from
> > > using some operators, and loss involved in casting. Dividing an integer
> > by
> > > another integer to obtain an integer result can result in loss, but
> > there's
> > > no implicit casting there and no loss due to casting.  Casting an integer
> > > to a float can also result in loss. So dividing an integer by a float,
> > for
> > > example, with an implicit cast has an additional avenue for loss: the
> > > implicit cast for the operands so that they're of the same type. I
> > believe
> > > this discussion so far has been about the latter, not the loss from the
> > > operations themselves.
> > >
> > > On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <benjamin.le...@datastax.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I would like to try to clarify things a bit to help people to understand
> > >> the true complexity of the problem.
> > >>
> > >> The *float *and *double *types are inexact numeric types. Not only at
> > the
> > >> operation level.
> > >>
> > >> If you insert 676543.21 in a *float* column and then read it, you will
> > >> realize that the value has been truncated to 676543.2.
> > >>
> > >> If you want accuracy the only way is to avoid those inexact types.
> > >> Using *decimals
> > >> *during operations will mitigate the problem but will not remove it.
> > >>
> > >>
> > >> I do not recall PostgreSQL behaving has described. If I am not mistaken
> > in
> > >> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
> > >> server and Oracle do. So all thoses databases will lose precision if you
> > >> are not carefull.
> > >>
> > >> If you truly need precision you can have it by using exact numeric types
> > >> for your data types. Of course it has a cost on performance, memory and
> > >> disk usage.
> > >>
> > >> The advantage of the current approach is that it give you the choice.
> > It is
> > >> up to you to decide what you need for your application. It is also in
> > line
> > >> with the way CQL behave everywhere else.
> > >>
> > > --
> > >
> > > Muru
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Reply via email to