On 1/6/2016 5:17 PM, Richard Eckart de Castilho wrote: > On 06.01.2016, at 22:49, Marshall Schor <[email protected]> wrote: >> In pseudocode - what's done for these changes is to send >> (1) an int : the number of changes following >> (2) for each change: an int representing the address into the aux heap of >> the item >> (3) for each change: a byte/short/int represent the value >> >> The bug is in line 2: for the short and long arrays, this is sent as a >> "short" >> instead of as an int; for the byte (also used for boolean arrays), this is >> sent >> as an "int" which I think is correct. >> >> This means that serialization will give wrong results if there's a change to >> some item in the short or long aux heaps which is indexed beyond 32767 items. > Which form are we talking about? Form 4? no - it's only for form 0 (just kidding - it's for **non-compressed** binary serialization. > Will it break all serialized data, only form 4 data (i.e. not form 6), or > only form 4 with delta tracking enabled (assuming this is something that > must be specifically enabled - never used delta CAS). None of these, only plain non-compressed binary serialization used in "delta cas" mode. > >> I think this should be fixed; but it will "break" compatibility with any >> stored >> existing serialized form, and furthermore, for UIMA-AS transport use, both >> the >> client and the server will need coordinated updates. > If it is a bug, +1 for fixing. > >> As a minimum, this should probably check to see if the error would be >> occurring >> (trying to serialize some change at slot > 32767, and throw an exception. >> >> If we change this to use write an "int" (instead of short), we could add a >> global configuration flag to disable this, too, if needed for some backward >> compatibility purpose. > I am starting to get suspicious of global flags for backwards compatibility. > E.g. since ALLOW_DUP_ADD_TO_INDEXES was introduced, we have people complaining > about a performance drop. ALLOW_DUP_ADD_TO_INDEXES can only be > enabled/disabled > globally, but not specifically for individual indexes. Neither can it be > temporarily disabled, e.g. during deserialization or other bulk operations. > I wonder if local getters/setters or ThreadLocal variables initialized by > a global setting wouldn't be a more appropriate option.
I was unaware of the performance issue; I may have missed some emails... Can you say how significant it is? If there were no performance issue, would the additional function be needed? I assume the performance drop is when duplicates are not allowed (the new default), and some users are wanting to restore the previous performance by turning on ALLOW_DUP .... Is this correct? Cheers. -Marshall > Best, > > -- Richard
