>> I am starting to get suspicious of global flags for backwards compatibility.
>> E.g. since ALLOW_DUP_ADD_TO_INDEXES was introduced, we have people
>> complaining
>> about a performance drop. ALLOW_DUP_ADD_TO_INDEXES can only be
>> enabled/disabled
>> globally, but not specifically for individual indexes. Neither can it be
>> temporarily disabled, e.g. during deserialization or other bulk operations.
>> I wonder if local getters/setters or ThreadLocal variables initialized by
>> a global setting wouldn't be a more appropriate option.
>
> I was unaware of the performance issue; I may have missed some emails... Can
> you say how significant it is? If there were no performance issue, would the
> additional function be needed?
>
> I assume the performance drop is when duplicates are not allowed (the new
> default), and some users are wanting to restore the previous performance by
> turning on ALLOW_DUP .... Is this correct?
I didn't track it in detail, but apparently, some time back Peter noticed a
drop in XMI deserialization performance and more recently also in compressed
binary CAS deserialization. Some time later, I had a person claiming in
private mail that deserialization was O(n^2) with respect to the CAS size.
At that point, I had a look at the code and it appears that in the worst
case, the duplication check degrades to a linear CAS scan
(cf. FSIndexRepositoryImpl line 98ff and FSIntArrayIndex line 101ff).
That would if the CAS contains only items that are equal with respect
to the index criteria, but not actually equal.
Consider a hypothetical annotation type:
Metadata extends Annotation {
String key;
String value;
}
where the begin/end are always set to 0..documentLength() and
key/value have arbitrary values. I didn't try it, but if I
understood the code correctly, a CAS containing only such
annotations would suffer heavily during the addToIndexes().
Cheers,
-- Richard