I posted a Jira for a proposed change in how 0-length UIMA arrays and lists are
managed.  These are immutable objects, and (theoretically) one instance (per
CAS) could be shared.

In the current implementation, this is managed explicitly by the user - they can
use a bunch of new APIs to get shared instances.

I'm thinking a better way is to make this automatically the case, and remove the
new bunch of APIs (a smaller API set is always a good thing, for essentially the
same functionality, IMHO).  The implementation would change so that the calls
that create "new" 0-length arrays/lists would instead of creating a new one,
only do that if none already existed, and if one already did, it would return
that one.

This follows Java's general direction for immutable objects, like Strings and
Integer values, which can be shared.

For cases where people wanted/needed a CAS value "marker" that was tiny, but
unique (like you get with Java's new Object()), we would keep "new TOP(aCas)" as
something that generated unique instances.  What do others think?

I've seen large-scale implementations of UIMA pipelines with lots of defaulted
0-length arrays in them; this has the potential to improve space/time
performance a reasonable amount for these.

-Marshall

Reply via email to