[ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398355#comment-16398355
 ] 

Antoine Pitrou commented on ARROW-640:
--------------------------------------

I don't think we're concerned about particular workloads for now. Something 
like {{%timeit hash\(x)}} (in IPython syntax) is a good micro-benchmark for 
this.

Integer is the main type that I think might be use in a hashing context so you 
may want to write a native hash implementation for them, while letting other 
types defer to {{as_py}}. 

Also in some cases (such as StructValue), the {{as_py}} fallback won't work. We 
may or may not care about this immediately (i.e. if you only want to implement 
numbers, we can open an issue for the other types).

> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-640
>                 URL: https://issues.apache.org/jira/browse/ARROW-640
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Miki Tebeka
>            Assignee: Alex Hagerman
>            Priority: Major
>             Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> <pyarrow.array.Int64Array object at 0x7f8c8c739e08>
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to