Marshall Schor created UIMA-2460:
------------------------------------

             Summary: Binary deserialization inefficient
                 Key: UIMA-2460
                 URL: https://issues.apache.org/jira/browse/UIMA-2460
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
            Reporter: Marshall Schor
            Assignee: Marshall Schor
            Priority: Minor
             Fix For: 2.4.1SDK


The CAS binary deserialization code can be made (much) more space efficient.  
Currently, the char data that is used in the strings is read into a char array; 
each string is represented as an offset into this char array + a length; and 
new Java strings are created using new String(chararray, offset, length).  This 
works, but it allocates a new char array for each string being created, and 
copies from the original char array.  This results in new char array objects 
for each string object.

The alternative is to reuse the original char array object, and not allocate 
any other char array objects.  This can be done by:
* making a temporary string from the entire char array object, and then
* making the new strings using tempString.substring(offset, offset + length)

For 1000 strings, this will save 999 char array object overheads (probably 
about 16 bytes per).

An additional space savings is possible by reusing the same string object for 
equal strings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to