[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398355#comment-16398355 ]
Antoine Pitrou commented on ARROW-640: -------------------------------------- I don't think we're concerned about particular workloads for now. Something like {{%timeit hash\(x)}} (in IPython syntax) is a good micro-benchmark for this. Integer is the main type that I think might be use in a hashing context so you may want to write a native hash implementation for them, while letting other types defer to {{as_py}}. Also in some cases (such as StructValue), the {{as_py}} fallback won't work. We may or may not care about this immediately (i.e. if you only want to implement numbers, we can open an issue for the other types). > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --------------------------------------------------------------------------- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Miki Tebeka > Assignee: Alex Hagerman > Priority: Major > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > <pyarrow.array.Int64Array object at 0x7f8c8c739e08> > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)