[
https://issues.apache.org/jira/browse/ARROW-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428457#comment-17428457
]
Weston Pace commented on ARROW-14290:
-------------------------------------
Hmm, fair point. A similar conversation happened in the discussion of
intervals. In general, we decided we do not want to allow postgres intervals
to be comparable. However, some frontends may want to emulate postgres, which
does allow for comparable intervals. The hope was those frontends could cast
those intervals to "postgres" intervals for comparability. So it might be nice
if the solution could extend beyond just strings (but maybe 2 instances does
not make a pattern and we can solve the interval thing elsewhere).
One challenge is that the comparison functions are used implicitly by quite a
few kernels (e.g. max, min, partition, select_k, etc.). In addition, if these
custom rules affect equality (this might be the case given we are talking about
unicode) then we have to worry about hashing and all of the kernels / nodes
that rely on hashing internally (dictionary encode, group by, etc.)
> [C++] String comparison in between ternary kernel
> -------------------------------------------------
>
> Key: ARROW-14290
> URL: https://issues.apache.org/jira/browse/ARROW-14290
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Benson Muite
> Assignee: Benson Muite
> Priority: Minor
>
> String comparisons in C++ will use order by unicode. This may not be suitable
> in many language applications, for example when using characters from
> languages that use more than ASCII. Sorting algorithms can often allow for
> the use of custom comparison functions. It would be helpful to allow for
> this for the between kernel as well. Initial work on the between kernel is
> being tracked in https://issues.apache.org/jira/browse/ARROW-9843
--
This message was sent by Atlassian Jira
(v8.3.4#803005)