Once you're done, I'd be interested to know if this has any measurable effect beyond the noise level.

On 27.08.2012 16:29, Marshall Schor (JIRA) wrote:
Marshall Schor created UIMA-2460:
------------------------------------

              Summary: Binary deserialization inefficient
                  Key: UIMA-2460
                  URL: https://issues.apache.org/jira/browse/UIMA-2460
              Project: UIMA
           Issue Type: Improvement
           Components: Core Java Framework
             Reporter: Marshall Schor
             Assignee: Marshall Schor
             Priority: Minor
              Fix For: 2.4.1SDK


The CAS binary deserialization code can be made (much) more space efficient.  
Currently, the char data that is used in the strings is read into a char array; 
each string is represented as an offset into this char array + a length; and 
new Java strings are created using new String(chararray, offset, length).  This 
works, but it allocates a new char array for each string being created, and 
copies from the original char array.  This results in new char array objects 
for each string object.

The alternative is to reuse the original char array object, and not allocate 
any other char array objects.  This can be done by:
* making a temporary string from the entire char array object, and then
* making the new strings using tempString.substring(offset, offset + length)

For 1000 strings, this will save 999 char array object overheads (probably 
about 16 bytes per).

An additional space savings is possible by reusing the same string object for 
equal strings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to