Marshall Schor created UIMA-4003:
------------------------------------

             Summary: add alternative int - int maps, and int sets for better 
space/time performance
                 Key: UIMA-4003
                 URL: https://issues.apache.org/jira/browse/UIMA-4003
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.6.0SDK
            Reporter: Marshall Schor
            Assignee: Marshall Schor
            Priority: Minor
             Fix For: 2.6.1SDK


int - int maps and int sets are implemented in UIMA as either red-black trees 
(size = 5 words (4 words for sets) + 1 bit per item, search time = log 2 size 
(binary search), insert /removal can cause rebalancing tree), or as intVectors 
(like ArrayList<Integers> but doesn't wrap ints as Integers).

For int - int maps, add a hash version (loses key "ordering"), which takes 3 - 
6 words per item (avg 4.5 words - slightly smaller), and has O(1) performance 
(based on existing JCasHashMap impl, but without concurrency support).

For int sets, add 2 styles: one based on the same hash map (space = 1.5 to 3 
words per item vs 4 words), the other based on BitSets.  The BitSet one would 
be faster, would have a conventional key ordering, and would be smaller, if the 
max int stored < 128 * number of items stored.  (This is often the case in uses 
where it is used to track Feature Structure addresses).

Finally, add one adaptable int set that is usually is implemented as a BitSet, 
but if the items being stored are such that the size would be significantly 
larger than the IntHashSet implementation, switch to that alternative 
representation.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to