[
https://issues.apache.org/jira/browse/UIMA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marshall Schor updated UIMA-2460:
---------------------------------
Affects Version/s: 2.4.0SDK
> Binary deserialization inefficient
> ----------------------------------
>
> Key: UIMA-2460
> URL: https://issues.apache.org/jira/browse/UIMA-2460
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 2.4.0SDK
> Reporter: Marshall Schor
> Assignee: Marshall Schor
> Priority: Minor
> Fix For: 2.4.1SDK
>
>
> The CAS binary deserialization code can be made (much) more space efficient.
> Currently, the char data that is used in the strings is read into a char
> array; each string is represented as an offset into this char array + a
> length; and new Java strings are created using new String(chararray, offset,
> length). This works, but it allocates a new char array for each string being
> created, and copies from the original char array. This results in new char
> array objects for each string object.
> The alternative is to reuse the original char array object, and not allocate
> any other char array objects. This can be done by:
> * making a temporary string from the entire char array object, and then
> * making the new strings using tempString.substring(offset, offset + length)
> For 1000 strings, this will save 999 char array object overheads (probably
> about 16 bytes per).
> An additional space savings is possible by reusing the same string object for
> equal strings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira