discord9 commented on issue #22142:
URL: https://github.com/apache/datafusion/issues/22142#issuecomment-4725787864

   > Ok, I have found this whole issue quite confusing as it is not clear to me 
what the scope of the issue is from this ticket as it describes a code problem, 
rather than the wrong results that can result. Specifically it is not clear to 
me from this issue, nor the various PRs that have been proposed, if this is 
something specific to timestamps or if it is a more general problem
   > 
   > -- Two DISTINCT nanosecond timestamps that both truncate to the SAME 
millisecond.
   > CREATE TABLE t AS
   > SELECT arrow_cast(1704067200001000000, 'Timestamp(Nanosecond, None)') AS 
ts_ns
   > UNION ALL
   > SELECT arrow_cast(1704067200001234567, 'Timestamp(Nanosecond, None)');
   > 
   > 
   > -- WRONG: should return 2 rows, returns 1.
   > SELECT * FROM t
   > WHERE arrow_cast(ts_ns, 'Timestamp(Millisecond, None)') = 
arrow_cast(1704067200001, 'Timestamp(Millisecond, None)');
   > 
   > Elapsed 0.002 seconds.
   > 
   > +-------------------------+
   > | ts_ns                   |
   > +-------------------------+
   > | 2024-01-01T00:00:00.001 |
   > +-------------------------+
   > 1 row(s) fetched.
   > Elapsed 0.000 seconds.
   > 
   > You can see this is incorrect as you both rows have the truncated values
   > 
   > SELECT arrow_cast(ts_ns, 'Timestamp(Millisecond, None)') from t;
   > +----------------------------------------------------------+
   > | arrow_cast(t.ts_ns,Utf8("Timestamp(Millisecond, None)")) |
   > +----------------------------------------------------------+
   > | 2024-01-01T00:00:00.001                                  |
   > | 2024-01-01T00:00:00.001                                  |
   > +----------------------------------------------------------+
   > 2 row(s) fetched.
   > Elapsed 0.000 seconds.
   > 
   > The same thing happens with decimals:
   > 
   > CREATE TABLE t(d DECIMAL(20,4)) AS VALUES (1.2345), (1.2399), (1.2500);
   > -- 1.2345 -> 1.23 at scale 2, so this MUST return 1 row:
   > SELECT * FROM t
   > WHERE arrow_cast(d, 'Decimal128(20, 2)') = arrow_cast(1.23, 
'Decimal128(20, 2)');
   > -- predicate rewritten to `d = 1.2300` (exact)  ->  returns 0 rows.  WRONG.
   > +---+
   > | d |
   > +---+
   > +---+
   > 0 row(s) fetched.
   > Elapsed 0.001 seconds.
   
   sorry I didn't make it more clear, it's not just timestamp but all kind of 
types cast could have similar issues, I just want to keep pr small and fix them 
one by one without one massive pr blocking all wrong cast, the block list one I 
will get to merge first, then maybe could look into the preimage cast one which 
allow a wider range of correct cast unwrap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to