Once you're done, I'd be interested to know if this has any measurable
effect beyond the noise level.
On 27.08.2012 16:29, Marshall Schor (JIRA) wrote:
Marshall Schor created UIMA-2460:
------------------------------------
Summary: Binary deserialization inefficient
Key: UIMA-2460
URL: https://issues.apache.org/jira/browse/UIMA-2460
Project: UIMA
Issue Type: Improvement
Components: Core Java Framework
Reporter: Marshall Schor
Assignee: Marshall Schor
Priority: Minor
Fix For: 2.4.1SDK
The CAS binary deserialization code can be made (much) more space efficient.
Currently, the char data that is used in the strings is read into a char array;
each string is represented as an offset into this char array + a length; and
new Java strings are created using new String(chararray, offset, length). This
works, but it allocates a new char array for each string being created, and
copies from the original char array. This results in new char array objects
for each string object.
The alternative is to reuse the original char array object, and not allocate
any other char array objects. This can be done by:
* making a temporary string from the entire char array object, and then
* making the new strings using tempString.substring(offset, offset + length)
For 1000 strings, this will save 999 char array object overheads (probably
about 16 bytes per).
An additional space savings is possible by reusing the same string object for
equal strings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira