This (overflow) is an excellent point, but this also affects aggregations which 
were introduced a long time ago.  They already inherit Java semantics for all 
of the relevant types (silent wrap around).  We probably want to be consistent, 
meaning either changing aggregations (which incurs a cost for changing API) or 
continuing the java semantics here.

This is why having these discussions explicitly in the community before a 
release is so critical, in my view.  It’s very easy for these semantic changes 
to go unnoticed on a JIRA, and then ossify.


> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> I think we should decide based on what is least surprising as you mention, 
> but isn't overridden by some other concern.
> 
> It seems to me the priorities are
> 
> * Correctness
> * Performance
> * User visible complexity
> * Developer visible complexity
> 
> Defaulting to silent implicit data loss is not ideal from a correctness 
> standpoint.
> 
> Doing something better like using wider types doesn't seem like a performance 
> issue.
> 
> From a user standpoint doing something less lossy doesn't look more complex 
> as long as it's consistent, and documented and doesn't change from version to 
> version.
> 
> There is some developer complexity, but this is a public API and we only get 
> one shot at this. 
> 
> I wonder about how overflow is handled as well. In VoltDB I think we threw on 
> overflow and tended to just do widening conversions to make that less common. 
> We didn't imitate another database (as far as I know) we just went with what 
> least likely to silently corrupt data.
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> 
> Ariel
> 
> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>> ç introduced arithmetic operators, and alongside these 
>> came implicit casts for their operands.  There is a semantic decision to 
>> be made, and I think the project would do well to explicitly raise this 
>> kind of question for wider input before release, since the project is 
>> bound by them forever more.
>> 
>> In this case, the choice is between lossy and lossless casts for 
>> operations involving integers and floating point numbers.  In essence, 
>> should:
>> 
>> (1) float + int = float, double + bigint = double; or
>> (2) float + int = double, double + bigint = decimal; or
>> (3) float + int = decimal, double + bigint = decimal
>> 
>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>> double.  Simply casting between these types changes the value.  This is 
>> what MS SQL Server does.
>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>> is what PostgreSQL does.
>> 
>> The question I’m interested in is not just which is the right decision, 
>> but how the right decision should be arrived at.  My view is that we 
>> should primarily aim for least surprise to the user, but I’m keen to 
>> hear from others.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>> <mailto:dev-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>> <mailto:dev-h...@cassandra.apache.org>
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> <mailto:dev-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: dev-h...@cassandra.apache.org 
> <mailto:dev-h...@cassandra.apache.org>

Reply via email to