spenczar commented on issue #38050: URL: https://github.com/apache/arrow/issues/38050#issuecomment-1753283976
I see! Thanks for the thorough explanation. My opinion is that logical data types like date64 should be a semantic layer on top of the physical data. I think that PyArrow should accept the possibility that the physical data doesn't conform to its semantic expectations, so it should be able to work with data with sub-day milliseconds, especially if they come from some foreign, non-pyarrow source. I think that means that equality should be changed, like you say, since that's a semantic statement. But always truncating the physical data seems too extreme - I'd prefer that PyArrow preserve whatever it was given. Maybe constructors from "raw" sources (Python lists, maybe Pandas Series) should truncate, though? Anyway - I think I agree that the compute logic should change. It seems likely that _many_ compute operations would need to change, though. For example, all the hash operations - would we need to always truncate before any compute operator is applied? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
