[
https://issues.apache.org/jira/browse/ARROW-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420946#comment-17420946
]
Weston Pace commented on ARROW-14122:
-------------------------------------
> For datafusion, we will go with postgres's approach because it aims to be
> postgres compatible. This is not a problem for datafusion SQL interface
> because we never said the SQL types maps one to one to Arrow types. In order
> words, Arrow interval type semantic is an implementation detail that's hidden
> from the users. The consequence of postgres's behavior is we won't be able to
> simply hash interval types by their physical bytes. We will need to normalize
> them first, i.e. "1 days 24 days" and "2 days" should result in the same hash
> key in hash aggregate and hash join compute kernels. Or maybe we could even
> make this compute semantic configurable in datafusion if different users need
> different behavior depending on their needs.
> Regardless which way we go, I think it would be good for all Arrow compute
> implementations to have the same consistent behavior.
An SQL postgres query will still need to map down to some kind of IR so even if
we don't define it at the "Arrow data type" level I think it would need to be
defined at some level.
What if we were to phrase it this way:
* The Interval type has no ordering (looks like partial ordering is up for
debate but I don't actually know what that buys us)
* There is an extension type "Postgres Interval" (I don't think it matters
whether we call it an Arrow extension type, an Arrow Compute IR type, or a
substrait type) which has a total ordering based on 24 hour days, 30 day
months, and 360 day years
* There is a cast from Arrow interval to Postgres Interval
Query plan producers that want to maintain Postgres compatibility can insert
the cast
So then, if I understand correctly, the point on hashing comes down to whether
or not the cast from Arrow Interval to Postgres Interval is a zero-copy
metadata only cast or the bytes need to be mutated for consistent hashing. I
don't know enough about the design of either system's hashing impl to answer
that.
> [C++] interval comparison kernels
> ---------------------------------
>
> Key: ARROW-14122
> URL: https://issues.apache.org/jira/browse/ARROW-14122
> Project: Apache Arrow
> Issue Type: Sub-task
> Reporter: Phillip Cloud
> Priority: Major
> Labels: kernel
>
> Subtask for tracking interval comparison kernels
--
This message was sent by Atlassian Jira
(v8.3.4#803005)