I don't think that rules about "semantic" equality (i.e. two values
being semantically "equal" -- like two different NaN bit patterns --
even though the memory is different) belong in the specification
documents.

On Fri, Nov 13, 2020 at 12:19 PM Jorge Cardoso Leitão
<jorgecarlei...@gmail.com> wrote:
>
> Hi Wes,
>
> Could you clarify? The logical data type you mean arrow's logical data
> type? The semantics of the logical data type are the only ones that could
> IMO justify a clarification, in particular, given a data type, how do we
> agree that slot i from array "a" and slot j from array "b" are equal.
>
> Best,
> Jorge
>
>
>
>
> On Fri, Nov 13, 2020 at 3:27 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> > >
> > > Hi Jorge,
> > > I think it would make sense to add some clarifications to the document
> > per
> > > Wes's comments. Do you want to maybe try to make a PR?
> > >
> > > One small edge case to consider is how NaN float values are compared.
> >
> > I think at the specification level, it should only be bit/byte-level
> > binary equality without respect to the semantics of the logical data
> > type.
> >
> > > -Micah
> > >
> > > On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão <
> > > jorgecarlei...@gmail.com> wrote:
> > >
> > > > Hi Wes,
> > > >
> > > > Thanks a lot. I agree. My question is whether we should make it
> > explicit in
> > > > the specification. AFAIK, "if the data represented in the slot is
> > equal"
> > > > depends on the datatype: for variable sized arrays with offsets (e.g.
> > > > strings), the equality of slot i is something along the lines of:
> > > >
> > > > start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T
> > > > end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T
> > > > lhs_value = lhs.buffer[1][start..end]
> > > > # same for rhs
> > > > lhs_value == rhs_value
> > > >
> > > > This logic is also tricky for any type with childs, where we need to
> > > > compare the slot of the child through recursion.
> > > > These things are not really implementation specific, yet they are
> > really
> > > > important when implementations inter-operate.
> > > >
> > > > Best,
> > > > Jorge
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > >
> > > > > hi Jorge,
> > > > >
> > > > > The intent when authoring the specification was as follows
> > > > >
> > > > > * If two array slots being compared are both null, then they are
> > equal
> > > > > * If one is null and the other is not, they are not equal
> > > > > * If they are both not null, then they are equal if the data
> > > > > represented in the slot is equal (and if dictionary indices reference
> > > > > the same dictionary value, even if the dictionaries are different,
> > > > > then they are equal because the data they represent is the same)
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão
> > > > > <jorgecarlei...@gmail.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Recently, I revisited the code for array equality in Rust. While
> > going
> > > > > > through it, I observed some assumptions about how we conclude that
> > two
> > > > > > elements of an arrow array are equal, and when two arrays are
> > equal.
> > > > > >
> > > > > > The notion of equality is also used throughout the document e.g.
> > when
> > > > we
> > > > > > offer examples using "unspecified", we are implicitly arguing that
> > we
> > > > > > should not care about that value when comparing arrays. It is also
> > used
> > > > > > when we use the wording "unique values" in the dictionary-encoded
> > > > arrays.
> > > > > >
> > > > > > The notion of array equality is important when we want to verify
> > > > > > interoperability between languages, where we often need to compare
> > > > arrays
> > > > > > (e.g. after a round-trip), as some implementations may change the
> > data
> > > > of
> > > > > > the "unspecified" slots and e.g. offsets.
> > > > > >
> > > > > > More fundamentally, IMO the specification offers a physical
> > > > > representation
> > > > > > (buffers, childs, offests, etc) of a logical asset (lists, structs,
> > > > int8,
> > > > > > int32), but currently does not say when two logical assets are
> > > > considered
> > > > > > equal.
> > > > > >
> > > > > > Would it make sense to systematize the notion of equality in the
> > > > > > specification, to align the different implementations into when
> > they
> > > > > should
> > > > > > consider two arrays to be equal?
> > > > > >
> > > > > > Best,
> > > > > > Jorge
> > > > >
> > > >
> >

Reply via email to