On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > Hi Jorge, > I think it would make sense to add some clarifications to the document per > Wes's comments. Do you want to maybe try to make a PR? > > One small edge case to consider is how NaN float values are compared.
I think at the specification level, it should only be bit/byte-level binary equality without respect to the semantics of the logical data type. > -Micah > > On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi Wes, > > > > Thanks a lot. I agree. My question is whether we should make it explicit in > > the specification. AFAIK, "if the data represented in the slot is equal" > > depends on the datatype: for variable sized arrays with offsets (e.g. > > strings), the equality of slot i is something along the lines of: > > > > start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T > > end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T > > lhs_value = lhs.buffer[1][start..end] > > # same for rhs > > lhs_value == rhs_value > > > > This logic is also tricky for any type with childs, where we need to > > compare the slot of the child through recursion. > > These things are not really implementation specific, yet they are really > > important when implementations inter-operate. > > > > Best, > > Jorge > > > > > > > > > > On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > hi Jorge, > > > > > > The intent when authoring the specification was as follows > > > > > > * If two array slots being compared are both null, then they are equal > > > * If one is null and the other is not, they are not equal > > > * If they are both not null, then they are equal if the data > > > represented in the slot is equal (and if dictionary indices reference > > > the same dictionary value, even if the dictionaries are different, > > > then they are equal because the data they represent is the same) > > > > > > - Wes > > > > > > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão > > > <jorgecarlei...@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > Recently, I revisited the code for array equality in Rust. While going > > > > through it, I observed some assumptions about how we conclude that two > > > > elements of an arrow array are equal, and when two arrays are equal. > > > > > > > > The notion of equality is also used throughout the document e.g. when > > we > > > > offer examples using "unspecified", we are implicitly arguing that we > > > > should not care about that value when comparing arrays. It is also used > > > > when we use the wording "unique values" in the dictionary-encoded > > arrays. > > > > > > > > The notion of array equality is important when we want to verify > > > > interoperability between languages, where we often need to compare > > arrays > > > > (e.g. after a round-trip), as some implementations may change the data > > of > > > > the "unspecified" slots and e.g. offsets. > > > > > > > > More fundamentally, IMO the specification offers a physical > > > representation > > > > (buffers, childs, offests, etc) of a logical asset (lists, structs, > > int8, > > > > int32), but currently does not say when two logical assets are > > considered > > > > equal. > > > > > > > > Would it make sense to systematize the notion of equality in the > > > > specification, to align the different implementations into when they > > > should > > > > consider two arrays to be equal? > > > > > > > > Best, > > > > Jorge > > > > >