alamb commented on code in PR #14223:
URL: https://github.com/apache/datafusion/pull/14223#discussion_r1926121097


##########
datafusion/expr-common/src/type_coercion/binary.rs:
##########
@@ -777,29 +777,19 @@ pub fn binary_numeric_coercion(
         (_, Float32) | (Float32, _) => Some(Float32),
         // The following match arms encode the following logic: Given the two
         // integral types, we choose the narrowest possible integral type that
-        // accommodates all values of both types. Note that some information
-        // loss is inevitable when we have a signed type and a `UInt64`, in
-        // which case we use `Int64`;i.e. the widest signed integral type.
-
-        // TODO: For i64 and u64, we can use decimal or float64
-        // Postgres has no unsigned type :(
-        // DuckDB v.0.10.0 has double (double precision floating-point number 
(8 bytes))
-        // for largest signed (signed sixteen-byte integer) and unsigned 
integer (unsigned sixteen-byte integer)
+        // accommodates all values of both types. Note that to avoid 
information
+        // loss when combining UInt64 with signed integers we use 
Decimal128(20, 0).
+        (UInt64, Int64 | Int32 | Int16 | Int8)

Review Comment:
   I think this has potentially (large) performance implications. 
   
   My understanding is that this means that Int64+Int64 will result in (always) 
a 128bit result? 
   
   So even though performing int64+int64 will never overflow, all queries will 
pay the price of 2x space (and some time) overhead?
   
   



##########
datafusion/sqllogictest/test_files/insert.slt:
##########
@@ -433,3 +433,30 @@ drop table test_column_defaults
 
 statement error DataFusion error: Error during planning: Column reference is 
not allowed in the DEFAULT expression : Schema error: No field named a.
 create table test_column_defaults(a int, b int default a+1)
+
+
+# test inserting UInt64 and signed integers into a bigint unsigned column
+statement ok
+create table unsigned_bigint_test (v bigint unsigned)
+
+query I
+insert into unsigned_bigint_test values (10000000000000000000), 
(18446744073709551615)
+----
+2
+
+query I
+insert into unsigned_bigint_test values (10000000000000000001), (1), 
(10000000000000000002)
+----
+3
+
+query I rowsort
+select * from unsigned_bigint_test

Review Comment:
   I agree the case of overflow on `*` of 2 64 bit numbers is more likely and 
automatically coercing to `Decimal128` may make sense.
   
   However, I would argue that if the user cares about avoiding overflows when 
doing Intger arithmetic they should use `Decimal128` in their input types



##########
datafusion/sqllogictest/test_files/union.slt:
##########
@@ -413,23 +413,23 @@ explain SELECT c1, c9 FROM aggregate_test_100 UNION ALL 
SELECT c1, c3 FROM aggre
 logical_plan
 01)Sort: aggregate_test_100.c9 DESC NULLS FIRST, fetch=5
 02)--Union
-03)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c9 AS Int64) 
AS c9
+03)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c9 AS 
Decimal128(20, 0)) AS c9
 04)------TableScan: aggregate_test_100 projection=[c1, c9]
-05)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c3 AS Int64) 
AS c9
+05)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c3 AS 
Decimal128(20, 0)) AS c9

Review Comment:
   this seems like a regression to me (there is now 2x the space needed)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to