[ 
https://issues.apache.org/jira/browse/UIMA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor updated UIMA-2460:
---------------------------------

    Affects Version/s: 2.4.0SDK
    
> Binary deserialization inefficient
> ----------------------------------
>
>                 Key: UIMA-2460
>                 URL: https://issues.apache.org/jira/browse/UIMA-2460
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.4.0SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.4.1SDK
>
>
> The CAS binary deserialization code can be made (much) more space efficient.  
> Currently, the char data that is used in the strings is read into a char 
> array; each string is represented as an offset into this char array + a 
> length; and new Java strings are created using new String(chararray, offset, 
> length).  This works, but it allocates a new char array for each string being 
> created, and copies from the original char array.  This results in new char 
> array objects for each string object.
> The alternative is to reuse the original char array object, and not allocate 
> any other char array objects.  This can be done by:
> * making a temporary string from the entire char array object, and then
> * making the new strings using tempString.substring(offset, offset + length)
> For 1000 strings, this will save 999 char array object overheads (probably 
> about 16 bytes per).
> An additional space savings is possible by reusing the same string object for 
> equal strings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to