[
https://issues.apache.org/jira/browse/UIMA-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Eckart de Castilho resolved UIMA-4329.
----------------------------------------------
Resolution: Done
[~schor] this work eventually graduated into the new CAS implementation of
UIMAv3, right? Since UIMAv3 is done now, this can be resolved.
> Object-based CAS implementation proposal/prototype
> --------------------------------------------------
>
> Key: UIMA-4329
> URL: https://issues.apache.org/jira/browse/UIMA-4329
> Project: UIMA
> Issue Type: Brainstorming
> Components: Core Java Framework
> Reporter: Nick Hill
> Priority: Minor
> Attachments: uima-core_obj-0.5.jar, uima-core_obj-0.5.tar.gz
>
>
> I have been experimenting with a simplified CAS implementation where each
> feature structure is an object and the indices are based on standard Java SDK
> concurrent collection classes. This replaces the complex custom array-based
> heaps and index implementations.
> The primary motivation was to make the CAS threadsafe so that multiple
> annotators could process one concurrently, but I think there are a number of
> other benefits.
> Summary of advantages:
> - Drastic simplification of code - most proprietary data structure impls
> removed, many other classes removed, index/index repo impls are about 25% of
> the size of the heap versions (good for future enhancements/maintainability)
> - Thread safety - multiple logically independent annotators can work on the
> same CAS concurrently - reading, writing and iterating over feature
> structures. Opens up a lot of parallelism possibilities
> - No need for heap resizing or wasted space in fixed size CAS backing arrays,
> no large up-front memory cost for CASes - pooling them should no longer be
> necessary
> - Unlike the current heap impl, when a FS is removed from CAS indices it's
> space is actually freed (can be GC'd)
> - Unification of CAS and JCas - cover class instance (if it exists) "is" the
> feature structure
> - Significantly better performance (speed) for many use-cases, especially
> where there is heavy access of CAS data
> - Usage of standard Java data structure classes means it can benefit more
> "for free" from ongoing improvements in the java SDK and from hardware
> optimizations targeted at these classes
> I was hoping to see if there's interest from the community in taking this
> further, maybe even as a replacement for the current impl in a future version
> of uima-core. There has already been some discussion on the mailing list
> under the subject "Alternate CAS implementation".
> I'm attaching the current prototype, which should support most existing UIMA
> functionality with the exception of:
> - Binary serialization/deserialization
> - C/C++ framework (requires binary serialization)
> - "Delta" CAS related function including CAS markers
> - Index "auto protection" (recent 2.7 feature)
> Note I don't mean to imply these things can't be supported, just that they
> aren't yet.
> Where these things aren't used it should be possible to try out the attached
> uima-core.jar as a drop-in replacement with existing apps/frameworks. An
> important caveat though is that any existing JCas cover classes will need
> recompiling with the new jar (but not re-JCasGenning).
> I'll also attach the code. I started by basically ripping out the CAS heaps,
> so there's a lot of code which is just commented out (e.g. in CASImpl.java).
> Lots of cleanup/tidyup is still needed, and theres various places which still
> need fixing for threadsafety (e.g. synchronization around some existing
> create-on-first-access logic.. this is separate to the indices though). But
> those things shouldn't affect existing usage. A convention I followed was not
> to rename modified classes (e.g. CASImpl), but where an equivalent impl was
> created from scratch I did give it a new name starting with "CC" (e.g.
> FeatureStructureImpl is now CCFeatureStructure). The cc stood for "concurrent
> CAS". I have kept it in sync with the latest compatible changes in the
> uima-core stream, apart from those related to the non-impl'd functions
> mentioned above.
> Most of the "valid" unit tests work. Some are tied to the internals and no
> longer apply, many don't compile because they use binary serialization and/or
> delta CAS related classes which I removed for the time being. Some others I
> had to generalize a bit because for example they assumed a specific order in
> places where the order should be arbitrary, and maybe some other similar
> reasons.
> md5 checksums:
> {{69f8e01eda8576960a3e6324a0d03d77 *uima-core_obj-0.5.jar}}
> {{3b90ebc78035c68c8c86b31abc8b3b68 *uima-core_obj-0.5.tar.gz}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)