spenczar commented on issue #38050:
URL: https://github.com/apache/arrow/issues/38050#issuecomment-1753283976

   I see! Thanks for the thorough explanation.
   
   My opinion is that logical data types like date64 should be a semantic layer 
on top of the physical data. I think that PyArrow should accept the possibility 
that the physical data doesn't conform to its semantic expectations, so it 
should be able to work with data with sub-day milliseconds, especially if they 
come from some foreign, non-pyarrow source.
   
   I think that means that equality should be changed, like you say, since 
that's a semantic statement. But always truncating the physical data seems too 
extreme - I'd prefer that PyArrow preserve whatever it was given. Maybe 
constructors from "raw" sources (Python lists, maybe Pandas Series) should 
truncate, though?
   
   Anyway - I think I agree that the compute logic should change. It seems 
likely that _many_ compute operations would need to change, though. For 
example, all the hash operations - would we need to always truncate before any 
compute operator is applied?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to