[
https://issues.apache.org/jira/browse/ARROW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Oluwalana updated ARROW-3002:
---------------------------------
Description:
{code:python}
>>> x = pa.field('record', pa.struct([pa.field('x', pa.int32(),
>>> nullable=False)]))
>>> y = pa.field('record', pa.struct([pa.field('x', pa.int32(),
>>> nullable=True)]))
>>> z = pa.field('record', pa.struct([pa.field('x', pa.int32(),
>>> nullable=True)]))
>>> x.__hash__()
-9223372036569171727
>>> y.__hash__()
285604054
>>> z.__hash__()
285604076
>>> x.type
StructType(struct<x: int32>)
>>> x.type.__hash__()
429437081997812647
>>> y.type.__hash__()
429437081997812647
>>> x
pyarrow.Field<record: struct<x: int32>>
>>> y
pyarrow.Field<record: struct<x: int32>>
}
{code}
Expected:
y.__hash__() should be the same as z.__hash__()
x.type.__hash__() should be different than y.type.__hash__()
was:
{code:python}
>>> x = pa.field('record', pa.struct([pa.field('x', pa.int32(),
>>> nullable=False)]))
>>> y = pa.field('record', pa.struct([pa.field('x', pa.int32(),
>>> nullable=True)]))
>>> x.__hash__()
-9223372036569171727
>>> y.__hash__()
285604054
>>> x.type
StructType(struct<x: int32>)
>>> x.type.__hash__()
429437081997812647
>>> y.type.__hash__()
429437081997812647
>>> x
pyarrow.Field<record: struct<x: int32>>
>>> y
pyarrow.Field<record: struct<x: int32>>
}
{code}
The StructType should take nullable fields into account when generating the
hash.
> Inconsistent DataType Hashing
> -----------------------------
>
> Key: ARROW-3002
> URL: https://issues.apache.org/jira/browse/ARROW-3002
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Sam Oluwalana
> Priority: Minor
>
> {code:python}
> >>> x = pa.field('record', pa.struct([pa.field('x', pa.int32(),
> >>> nullable=False)]))
> >>> y = pa.field('record', pa.struct([pa.field('x', pa.int32(),
> >>> nullable=True)]))
> >>> z = pa.field('record', pa.struct([pa.field('x', pa.int32(),
> >>> nullable=True)]))
> >>> x.__hash__()
> -9223372036569171727
> >>> y.__hash__()
> 285604054
> >>> z.__hash__()
> 285604076
> >>> x.type
> StructType(struct<x: int32>)
> >>> x.type.__hash__()
> 429437081997812647
> >>> y.type.__hash__()
> 429437081997812647
> >>> x
> pyarrow.Field<record: struct<x: int32>>
> >>> y
> pyarrow.Field<record: struct<x: int32>>
> }
> {code}
> Expected:
> y.__hash__() should be the same as z.__hash__()
> x.type.__hash__() should be different than y.type.__hash__()
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)