On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> Hi Jorge,
> I think it would make sense to add some clarifications to the document per
> Wes's comments. Do you want to maybe try to make a PR?
>
> One small edge case to consider is how NaN float values are compared.

I think at the specification level, it should only be bit/byte-level
binary equality without respect to the semantics of the logical data
type.

> -Micah
>
> On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Hi Wes,
> >
> > Thanks a lot. I agree. My question is whether we should make it explicit in
> > the specification. AFAIK, "if the data represented in the slot is equal"
> > depends on the datatype: for variable sized arrays with offsets (e.g.
> > strings), the equality of slot i is something along the lines of:
> >
> > start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T
> > end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T
> > lhs_value = lhs.buffer[1][start..end]
> > # same for rhs
> > lhs_value == rhs_value
> >
> > This logic is also tricky for any type with childs, where we need to
> > compare the slot of the child through recursion.
> > These things are not really implementation specific, yet they are really
> > important when implementations inter-operate.
> >
> > Best,
> > Jorge
> >
> >
> >
> >
> > On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > > hi Jorge,
> > >
> > > The intent when authoring the specification was as follows
> > >
> > > * If two array slots being compared are both null, then they are equal
> > > * If one is null and the other is not, they are not equal
> > > * If they are both not null, then they are equal if the data
> > > represented in the slot is equal (and if dictionary indices reference
> > > the same dictionary value, even if the dictionaries are different,
> > > then they are equal because the data they represent is the same)
> > >
> > > - Wes
> > >
> > > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão
> > > <jorgecarlei...@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Recently, I revisited the code for array equality in Rust. While going
> > > > through it, I observed some assumptions about how we conclude that two
> > > > elements of an arrow array are equal, and when two arrays are equal.
> > > >
> > > > The notion of equality is also used throughout the document e.g. when
> > we
> > > > offer examples using "unspecified", we are implicitly arguing that we
> > > > should not care about that value when comparing arrays. It is also used
> > > > when we use the wording "unique values" in the dictionary-encoded
> > arrays.
> > > >
> > > > The notion of array equality is important when we want to verify
> > > > interoperability between languages, where we often need to compare
> > arrays
> > > > (e.g. after a round-trip), as some implementations may change the data
> > of
> > > > the "unspecified" slots and e.g. offsets.
> > > >
> > > > More fundamentally, IMO the specification offers a physical
> > > representation
> > > > (buffers, childs, offests, etc) of a logical asset (lists, structs,
> > int8,
> > > > int32), but currently does not say when two logical assets are
> > considered
> > > > equal.
> > > >
> > > > Would it make sense to systematize the notion of equality in the
> > > > specification, to align the different implementations into when they
> > > should
> > > > consider two arrays to be equal?
> > > >
> > > > Best,
> > > > Jorge
> > >
> >

Reply via email to