I'm not saying "let's not do this no matter what and ever fix technical debt", nor am I fearing decision.
But I *do* think decisions, technical ones at least, should be fact and data driven. And I'm not even sure why we're talking of having a vote here. The Apache Way is *not* meant to be primarily vote-driven, votes are supposed to be a last resort when, after having debated facts and data, no consensus can be reached. Can we have the debate on facts and data first? Please. At the of the day, I object to: "There are still a number of unresolved issues, but to make progress I wonder if it would first be helpful to have a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?". More specifically, I disagree that such vote is a good starting point. Let's identify and discuss the unresolved issues first. Let's check precisely what getting our arithmetic ANSI SQL 92 compliant means and how we can get it. I do support the idea of making such analysis btw, it would be good data, but no vote is needed whatsoever to make it. Again, I object to voting first and doing the analysis 2nd. -- Sylvain On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <j...@jonhaddad.com> wrote: > I can’t agree more. We should be able to make changes in a manner that > improves the DB In the long term, rather than live with the technical debt > of arbitrary decisions made by a handful of people. > > I also agree that putting a knob in place to let people migrate over is a > reasonable decision. > > Jon > > On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > The goal is simply to agree on a set of well-defined principles for how > we > > should behave. If we don’t like the implications that arise, we’ll have > > another vote? A democracy cannot bind itself, so I never understood this > > fear of a decision. > > > > A database also has a thousand toggles. If we absolutely need to, we can > > introduce one more. > > > > We should be doing this upfront a great deal more often. Doing it > > retrospectively sucks, but in my opinion it's a bad reason to bind > > ourselves to whatever made it in. > > > > Do we anywhere define the principles of our current behaviour? I > couldn’t > > find it. > > > > > > > On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebre...@gmail.com> wrote: > > > > > > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith < > > bened...@apache.org> > > > wrote: > > > > > >> FWIW, my meaning of arithmetic in this context extends to any features > > we > > >> have already released (such as aggregates, and perhaps other built-in > > >> functions) that operate on the same domain. We should be consistent, > > after > > >> all. > > >> > > >> Whether or not we need to revisit any existing functionality we can > > figure > > >> out after the fact, once we have agreed what our behaviour should be. > > >> > > > > > > I'm not sure I correctly understand the process suggested, but I don't > > > particularly like/agree with what I understand. What I understand is a > > > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no > > real > > > evaluation of what that entails (at least I haven't seen one), and that > > > this vote, if passed, would imply we'd then make any backward > > incompatible > > > change necessary to achieve compliance ("my meaning of arithmetic in > this > > > context extends to any features we have already released" and "Whether > or > > > not we need to revisit any existing functionality we can figure out > after > > > the fact, once we have agreed what our behaviour should be"). > > > > > > This might make sense of a new product, but at our stage that seems > > > backward to me. I think we owe our users to first make the effort of > > > identifying what "inconsistencies" our existing arithmetic has[1] and > > > _then_ consider what options we have to fix those, with their pros and > > cons > > > (including how bad they break backward compatibility). And if _then_ > > > getting ANSI SQL 92 compliant proves to not be disruptive (or at least > > > acceptably so), then sure, that's great. > > > > > > [1]: one possibly efficient way to do that could actually be to compare > > our > > > arithmetic to ANSI SQL 92. Not that all differences found would imply > > > inconsistencies/wrongness of our arithmetic, but still, it should be > > > helpful. And I guess my whole point is that we should that analysis > > first, > > > and then maybe decide that being ANSI SQL 92 is a reasonable option, > not > > > decide first and live with the consequences no matter what they are. > > > > > > -- > > > Sylvain > > > > > > > > >> I will make this more explicit for the vote, but just to clarify the > > >> intention so that we are all discussing the same thing. > > >> > > >> > > >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adwei...@fastmail.fm> > wrote: > > >>> > > >>> Hi, > > >>> > > >>> +1 > > >>> > > >>> This is a public API so we will be much better off if we get it right > > >> the first time. > > >>> > > >>> Ariel > > >>> > > >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com> > > >> wrote: > > >>>> > > >>>> Sounds good to me. > > >>>> > > >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith < > > >> bened...@apache.org> > > >>>> wrote: > > >>>> > > >>>>> So, this thread somewhat petered out. > > >>>>> > > >>>>> There are still a number of unresolved issues, but to make > progress I > > >>>>> wonder if it would first be helpful to have a vote on ensuring we > are > > >> ANSI > > >>>>> SQL 92 compliant for our arithmetic? This seems like a sensible > > >> baseline, > > >>>>> since we will hopefully minimise surprise to operators this way. > > >>>>> > > >>>>> If people largely agree, I will call a vote, and we can pick up a > > >> couple > > >>>>> of more focused discussions afterwards on how we interpret the > leeway > > >> it > > >>>>> gives. > > >>>>> > > >>>>> > > >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> > wrote: > > >>>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> From reading the spec. Precision is always implementation defined. > > The > > >>>>> spec specifies scale in several cases, but never precision for any > > >> type or > > >>>>> operation (addition/subtraction, multiplication, division). > > >>>>>> > > >>>>>> So we don't implement anything remotely approaching precision and > > >> scale > > >>>>> in CQL when it comes to numbers I think? So we aren't going to > follow > > >> the > > >>>>> spec for scale. We are already pretty far down that road so I would > > >> leave > > >>>>> it alone. > > >>>>>> > > >>>>>> I don't think the spec is asking for the most approximate type. > It's > > >>>>> just saying the result is approximate, and the precision is > > >> implementation > > >>>>> defined. We could return either float or double. I think if one of > > the > > >>>>> operands is a double we should return a double because clearly the > > >> schema > > >>>>> thought a double was required to represent that number. I would > also > > >> be in > > >>>>> favor of returning a double all the time so that people can expect > a > > >>>>> consistent type from expressions involving approximate numbers. > > >>>>>> > > >>>>>> I am a big fan of widening for arithmetic expressions in a > database > > to > > >>>>> avoid having to error on overflow. You can go to the trouble of > only > > >>>>> widening the minimum amount, but I think it's simpler if we always > > >> widen to > > >>>>> bigint and double. This would be something the spec allows. > > >>>>>> > > >>>>>> Definitely if we can make overflow not occur we should and the > spec > > >>>>> allows that. We should also not return different types for the same > > >> operand > > >>>>> types just to work around overflow if we detect we need more > > precision. > > >>>>>> > > >>>>>> Ariel > > >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote: > > >>>>>>> If it’s in the SQL spec, I’m fairly convinced. Thanks for > digging > > >> this > > >>>>>>> out (and Mike for getting some empirical examples). > > >>>>>>> > > >>>>>>> We still have to decide on the approximate data type to return; > > right > > >>>>>>> now, we have float+bigint=double, but float+int=float. I think > > this > > >> is > > >>>>>>> fairly inconsistent, and either the approximate type should > always > > >> win, > > >>>>>>> or we should always upgrade to double for mixed operands. > > >>>>>>> > > >>>>>>> The quoted spec also suggests that decimal+float=float, and > decimal > > >>>>>>> +double=double, whereas we currently have decimal+float=decimal, > > and > > >>>>>>> decimal+double=decimal > > >>>>>>> > > >>>>>>> If we’re going to go with an approximate operand implying an > > >>>>> approximate > > >>>>>>> result, I think we should do it consistently (and consistent with > > the > > >>>>>>> SQL92 spec), and have the type of the approximate operand always > be > > >> the > > >>>>>>> return type. > > >>>>>>> > > >>>>>>> This would still leave a decision for float+double, though. The > > most > > >>>>>>> consistent behaviour with that stated above would be to always > take > > >> the > > >>>>>>> most approximate type to return (i.e. float), but this would seem > > to > > >> me > > >>>>>>> to be fairly unexpected for the user. > > >>>>>>> > > >>>>>>> > > >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> > > wrote: > > >>>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> I agree with what's been said about expectations regarding > > >> expressions > > >>>>> involving floating point numbers. I think that if one of the inputs > > is > > >>>>> approximate then the result should be approximate. > > >>>>>>>> > > >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not > to > > >>>>> follow dogmatically necessarily. > > >>>>>>>> > > >>>>>>>> From the SQL 92 spec regarding assignment > > >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section > > 4.6: > > >>>>>>>> " > > >>>>>>>> Values of the data types NUMERIC, DECIMAL, INTEGER, > SMALLINT, > > >>>>>>>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all > > >>>>> mutually > > >>>>>>>> comparable and mutually assignable. If an assignment would > > >>>>> result > > >>>>>>>> in a loss of the most significant digits, an exception > > >> condition > > >>>>>>>> is raised. If least significant digits are lost, > > >> implementation- > > >>>>>>>> defined rounding or truncating occurs with no exception > > >>>>> condition > > >>>>>>>> being raised. The rules for arithmetic are generally > governed > > >> by > > >>>>>>>> Subclause 6.12, "<numeric value expression>". > > >>>>>>>> " > > >>>>>>>> > > >>>>>>>> Section 6.12 numeric value expressions: > > >>>>>>>> " > > >>>>>>>> 1) If the data type of both operands of a dyadic arithmetic > > >>>>> opera- > > >>>>>>>> tor is exact numeric, then the data type of the result is > > >>>>> exact > > >>>>>>>> numeric, with precision and scale determined as follows: > > >>>>>>>> ... > > >>>>>>>> 2) If the data type of either operand of a dyadic arithmetic > > >> op- > > >>>>>>>> erator is approximate numeric, then the data type of the > > re- > > >>>>>>>> sult is approximate numeric. The precision of the result > is > > >>>>>>>> implementation-defined. > > >>>>>>>> " > > >>>>>>>> > > >>>>>>>> And this makes sense to me. I think we should only return an > exact > > >>>>> result if both of the inputs are exact. > > >>>>>>>> > > >>>>>>>> I think we might want to look closely at the SQL spec and > > especially > > >>>>> when the spec requires an error to be generated. Those are > sometimes > > >> in the > > >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate > > >> from the > > >>>>> spec we should be asking why is it in the spec and why are we > > >> deviating. > > >>>>>>>> > > >>>>>>>> Another issue besides overflow handling is how we determine > > >> precision > > >>>>> and scale for expressions involving two exact types. > > >>>>>>>> > > >>>>>>>> Ariel > > >>>>>>>> > > >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote: > > >>>>>>>>> Hi, > > >>>>>>>>> > > >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing > things, > > >>>>> which is > > >>>>>>>>> returning just about any type depending on the order of > > operators. > > >>>>>>>>> Considering it actually mentions in the docs that using > > >>>>> numeric/decimal is > > >>>>>>>>> slow and also multiple times that floating points are inexact. > So > > >>>>> doing > > >>>>>>>>> some math with Postgres (9.6.5): > > >>>>>>>>> > > >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double > > >>>>>>>>> precision 2147483647 > > >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0 > > >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double > > >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double > > >>>>> 2147483647 > > >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double > > >>>>> 2147483647 > > >>>>>>>>> > > >>>>>>>>> With + - we can get the same amount of mixture of returned > types. > > >>>>> There's > > >>>>>>>>> no difference in those calculations, just some casting. To me > > >>>>>>>>> floating-point math indicates inexactness and has errors and > > >> whoever > > >>>>> mixes > > >>>>>>>>> up two different types should understand that. If one didn't > want > > >>>>> exact > > >>>>>>>>> numeric type, why would the server return such? The floating > > point > > >>>>> value > > >>>>>>>>> itself could be wrong already before the calculation - trying > to > > >> say > > >>>>> we do > > >>>>>>>>> it lossless is just wrong. > > >>>>>>>>> > > >>>>>>>>> Fun with 2.65: > > >>>>>>>>> > > >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 > > >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65 > > >>>>>>>>> > > >>>>>>>>> SELECT round(2.65) returns numeric 4 > > >>>>>>>>> SELECT round(2.65::double precision) returns double 4 > > >>>>>>>>> > > >>>>>>>>> SELECT 2.65 * 1 returns double 2.65 > > >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65 > > >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650 > > >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65 > > >>>>>>>>> > > >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3 > > >>>>>>>>> SELECT round(2.65) * round(1) returns double 3 > > >>>>>>>>> > > >>>>>>>>> So as we're going to have silly values in any case, why pretend > > >>>>> something > > >>>>>>>>> else? Also, exact calculations are slow if we crunch large > amount > > >> of > > >>>>>>>>> numbers. I guess I slightly deviated towards Postgres' > > implemention > > >>>>> in this > > >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. > And > > >> most > > >>>>>>>>> importantly, I would definitely want the exact same type > returned > > >>>>> each time > > >>>>>>>>> I do a calculation. > > >>>>>>>>> > > >>>>>>>>> - Micke > > >>>>>>>>> > > >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith < > > >>>>> bened...@apache.org> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> As far as I can tell we reached a relatively strong consensus > > >> that we > > >>>>>>>>>> should implement lossless casts by default? Does anyone have > > >>>>> anything more > > >>>>>>>>>> to add? > > >>>>>>>>>> > > >>>>>>>>>> Looking at the emails, everyone who participated and > expressed a > > >>>>>>>>>> preference was in favour of the “Postgres approach” of > upcasting > > >> to > > >>>>> decimal > > >>>>>>>>>> for mixed float/int operands? > > >>>>>>>>>> > > >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what > > >> we’re > > >>>>> doing > > >>>>>>>>>> for 4.0. Then hopefully we can move on to a collective > decision > > >> on > > >>>>> Ariel’s > > >>>>>>>>>> concerns about overflow, which I think are also pressing - > > >>>>> particularly for > > >>>>>>>>>> tinyint and smallint. This does also impact implicit casts > for > > >> mixed > > >>>>>>>>>> integer type operations, but an approach for these will > probably > > >>>>> fall out > > >>>>>>>>>> of any decision on overflow. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan < > > >>>>> murukesh.moha...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> I think you're conflating two things here. There's the loss > > >>>>> resulting > > >>>>>>>>>> from > > >>>>>>>>>>> using some operators, and loss involved in casting. Dividing > an > > >>>>> integer > > >>>>>>>>>> by > > >>>>>>>>>>> another integer to obtain an integer result can result in > loss, > > >> but > > >>>>>>>>>> there's > > >>>>>>>>>>> no implicit casting there and no loss due to casting. > Casting > > an > > >>>>> integer > > >>>>>>>>>>> to a float can also result in loss. So dividing an integer > by a > > >>>>> float, > > >>>>>>>>>> for > > >>>>>>>>>>> example, with an implicit cast has an additional avenue for > > loss: > > >>>>> the > > >>>>>>>>>>> implicit cast for the operands so that they're of the same > > type. > > >> I > > >>>>>>>>>> believe > > >>>>>>>>>>> this discussion so far has been about the latter, not the > loss > > >> from > > >>>>> the > > >>>>>>>>>>> operations themselves. > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer < > > >>>>> benjamin.le...@datastax.com> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi, > > >>>>>>>>>>>> > > >>>>>>>>>>>> I would like to try to clarify things a bit to help people > to > > >>>>> understand > > >>>>>>>>>>>> the true complexity of the problem. > > >>>>>>>>>>>> > > >>>>>>>>>>>> The *float *and *double *types are inexact numeric types. > Not > > >> only > > >>>>> at > > >>>>>>>>>> the > > >>>>>>>>>>>> operation level. > > >>>>>>>>>>>> > > >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read > it, > > >> you > > >>>>> will > > >>>>>>>>>>>> realize that the value has been truncated to 676543.2. > > >>>>>>>>>>>> > > >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact > > >> types. > > >>>>>>>>>>>> Using *decimals > > >>>>>>>>>>>> *during operations will mitigate the problem but will not > > remove > > >>>>> it. > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am > not > > >>>>> mistaken > > >>>>>>>>>> in > > >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to > > >> what > > >>>>> MS SQL > > >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose > > >> precision > > >>>>> if you > > >>>>>>>>>>>> are not carefull. > > >>>>>>>>>>>> > > >>>>>>>>>>>> If you truly need precision you can have it by using exact > > >> numeric > > >>>>> types > > >>>>>>>>>>>> for your data types. Of course it has a cost on performance, > > >>>>> memory and > > >>>>>>>>>>>> disk usage. > > >>>>>>>>>>>> > > >>>>>>>>>>>> The advantage of the current approach is that it give you > the > > >>>>> choice. > > >>>>>>>>>> It is > > >>>>>>>>>>>> up to you to decide what you need for your application. It > is > > >> also > > >>>>> in > > >>>>>>>>>> line > > >>>>>>>>>>>> with the way CQL behave everywhere else. > > >>>>>>>>>>>> > > >>>>>>>>>>> -- > > >>>>>>>>>>> > > >>>>>>>>>>> Muru > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >> --------------------------------------------------------------------- > > >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>>>>> For additional commands, e-mail: > dev-h...@cassandra.apache.org > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >> --------------------------------------------------------------------- > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > --------------------------------------------------------------------- > > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>>> > > >>>>>> > > >>>>>> > > --------------------------------------------------------------------- > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > --------------------------------------------------------------------- > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>> > > >>>>> -- > > >>>> Jon Haddad > > >>>> http://www.rustyrazorblade.com > > >>>> twitter: rustyrazorblade > > >>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>> > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >> > > >> > > > > -- > Jon Haddad > http://www.rustyrazorblade.com > twitter: rustyrazorblade >