Re: Alternate CAS implementation

Nick Hill Sat, 18 Apr 2015 15:29:19 -0700

Thanks Marshall, have added some responses inline below



Quoting Marshall Schor <[email protected]>:

Re: supporting multiple implementations of the CAS.

The original implementation of the CAS picked a set of space/performance
trade-offs. Some of these were motivated by earlier frameworks builtin C++, bythe state-of-affairs of early Java implementations, etc. The goalwas to createsomething that would be attractive to use by the community of peopleworking in
the field of NLP, who were often concerned with these kinds of things.

Later, JCas was added, to make things easier for people comfortable in Java.
And later still, uimaFIT added more convenience things from the world of Java
and related technologies (e.g. Spring, dependency injection, running without
XML, etc.).

Still later, the platform has started paying more attention to optimizations
around multi-core and L1/L2/L3 memory hierarchies, as thosetechnologies became
much more prevalent.

All along, there was close attention paid to backwards compatibility; a main
reason was to create an "investable" platform - one where developers could
"invest" work in, and expect their work to have a long, useful life,even as the
framework might evolve to keep up with hardware and software changes.

Another part of making UIMA an attractive place to invest work in annotator
development was the possibility of first developing your annotator in an
easy-to-use paradigm, and later "optimizing" it for speed / space.  This, for
example, recently happened in version 2.7.0 with the release of an"upgrade" to
the CasCopier.  The first version was completed using normal Java APIs to the
CAS, and served for several years. When some applications begannoticing thiswas becoming a bottleneck, an optimization was done which a) greatlyspeeded it
up, and b) used much less Java heap space in the process, principally by
replacing CAS and index access and minipulations with their so-called
"low-level" equivalents.  These low-level equivalents are there just for this
reason; they typically create and use no Java objects at all, and can be much
faster.

I'm not questioning historical design choices, I understand variousthings were different in the past and that it has been an evolution;rather I'm suggesting that the design as it stands now doesn't makesense.

I'm proposing the fact that complex low level API usage is needed bydevelopers to get better performance is primarily an artifact of thecustom heap implementation itself.

In other words, we're providing an implementation which is morecomplex and slower than a simple object-based one and then saying ifyou want comparable performance you need to rewrite your code usingthe cumbersome low level APIs.


Would it not be better if the easy-to-use paradigm was already fast enough?

The CASCopier low-level rewrite example is a very good illustration -where better performance was achieved *despite* the underlying impl,not because of it. The obj-based impl still uses the normal Java APIs(basically the pre-rewrite CASCopier logic), and the "as-is" rawcopying speed is very close to the super-optimized low level one(*).In a practical context where copying is done in conjunction with otherCAS access, the obj impl appears much faster on aggregate.

On the backwards compatibility point - as already discussed in thethread what I'm proposing should have minimal if any impact onexisting developer investment. It's now been tested with variousapps/frameworks without code changes being needed (uimaFIT, Ruta,hopefully DKPro once binary serialization compatibility is there,..)

(*) This statement applies to copying with a shared typesystem. Atfirst glance it looks like some of the CASCopier optimization done wasaimed specifically at speeding up aspects of cross-typesystem copyingwhich are independent of the LL CAS aspects and I assume should besimple to transfer over to the new impl.


I think Nick has several interesting ideas, all bundled up together in a
particular set of design choices, for an alternative CAS design.  At a high
level, I think these are:

1) having the main data storage be within individual sets of Java Objects
(multiples for each Feature Structure Instance), and letting Java manage the
space allocation and reclamation (via garbage collection) of these.

2) having the indexes index these Java structures (vs in the core: they index
offsets in the heap, represented as "ints").

3) having the index structures themselves do a different trade-off of space,

time, and concurrency support. In general, the trend seems to beless concern

for space (as the cost per bit has dropped faster than the cost per
computation), and towards supporting more concurrency (as the ability to run
multiple threads in parallel has grown).

4) Making more use of "standard" (but possibly new/evolving) capabilities in

core Java, instead of doing lots of custom one-of-a-kind Java code.In general, I think this is a very good idea :-). I foresee shiftsin this direction where

possible, but probably incrementally, in the core UIMA implementation.

I think these high level concepts have valuable ideas to consider augmenting

core UIMA with. And the whole package might be, together, aninteresting design point for some users.

I don't think the ideas are so independent though. If you start withthe assumption that standard objects are a better choice than customheaps, then simple index implementations based on standard Javacollection impls are also a natural thing to do.

Given that, I don't really understand how the existing impl would be"augmented", could you elaborate on this? Which bit of what yousummarized would be considered disadvantageous and excluded? In myview all of these things are advantageous, so why limit theimprovement/simplification?

I also think that many existing (and future) users like the idea ofa platform
which supports a sliding scale of space / time / optimization tradeoffs as
represented by the current UIMA design, so I don't currently thinkit's a goodidea to drop the current UIMA internal design in favor of this newdesign point.

Could you expand on the space / time / optimization tradeoffs you havein mind? I think it's a big mis-assumption that the current customheaps and index implementations provide any meaningful benefit interms of speed/memory usage over a simpler object impl.

In my experiments so far with some real-life usage, the obj-based implappears to be better in terms of both speed and space. I'm also highlyskeptical that there exist any real-life use cases where the magnitudeof the "space" difference is meaningful (whether that's a net increaseor reduction).

In other words, what benefits would be "given up" by abandoning thecustom heaps/indices? If the object based impl satisfies (or can beeasily made to satisfy) the vast majority of practical use casesbetter, what's really being trading off and why would one choose toretain all of the complex custom baggage?

In particular, what is the sliding scale of tradeoffs which thecurrent impl provides? I actually think the object based approachmakes it simpler to build/plug in different index datastructures thatcould provide usecase specific tradeoffs (grouping by type withinsorted indices being a prime example). It's also trivial to swapconcurrent collections with their non-concurrent counterparts wherethreadsafety isn't required, etc.

There are other design points/choices that could be considered.  For example,
with today's technology, I think it is quite feasible to create Feature
Structures as Java objects where the features are "fields" in theJava object.This is enabled by the ability to compile Java classes as part of

the startup of
application instance. I'm thinking along these lines: the currentapproach to
UIMA Type merging would be followed by a similar JCas cover class (optional
creation) and merging, followed by compiling the JCas cover classes during
startup. This could be a kind of just-in-time (JIT) running ofJCasGen at thestart of every run, on the fully merged type system. (I'm surethere's issues I
haven't thought of; this is just the beginnings of an idea :-) ).

This is an interesting idea, but introducing on-the-fly codegeneration and compilation as a standard runtime step sounds a bitprecarious (but maybe I'm wrong). Would user code also requirerecompilation using the dynamically generated classes?I also can't envisage how it would retain backwards compatibility withexisting code (which I've learnt is a pretty big deal!), particularlysince as you know there are some contexts where custom modificationsto generated JCas classes are used extensively.Furthermore it's also not clear how much benefit it would give beyondthe obj-based impl already done. From a CAS access point of view therewould be slightly less indirection by removing the arrays, but ingeneric APIs reflection would be required. So I think we'd want to beconfident this net speed advantage was non-negligible before payingthe above penalties. As mentioned earlier, I'm fairly sure anyadditional "space saving" wouldn't be significant in practical contexts.

-Marshall

On 4/2/2015 3:55 PM, Nick Hill wrote:

Thanks Richard, more replies below...

Quoting Richard Eckart de Castilho <[email protected]>:

Hi Nick,

On 02.04.2015, at 01:37, Nick Hill <[email protected]> wrote:

From my point of view, it would be nice if it was possible toconfigure the

UIMA framework to produce either this new kind of CAS or the old one
without having to exchange a JAR - doing so statically at initialization
time or even dynamically at runtime. E.g. to allow easily running test
cases against both implementations.


When you say "produce", there shouldn't be any visible difference in
anything output or persisted, the impl is just how the CAS is stored
internally in memory while processing is happening.

It won't be possible to switch the impl being used at runtime. There are

classes for example with the same names but different impls (e.g.CASImpl).

I know this isn't ideal for tests/comparisons between the two impls but
quite a lot of things are currently tightly-coupled to the heap internals
and so switching a jar doesn't seem too big a price to pay given no other
code changes are needed.

What do you plan to be the ultimate goal of this experiment? Is itto support

different CAS implementations or is it to replace the existing CAS
implementation with a totally different one?

Most things in UIMA are created through factories (not the CAS so far). So
theoretically, one could replace most classes by custom classes by
reconfiguring the framework to use different factory classes or having the

factories produce different implementations. Can you imagine thatas well for

the CAS?


For users the implementation shouldn't matter. They shouldn't observe any

functional difference and therefore shouldn't really care if theimpl changes

underneath. All consuming code should work as-is, with the exception of code
which accesses 'internals' directly - but I'd see this as analogous to
accessing private fields in some java SDK class, which breaks when those
fields change in a newer SDK version.

As such I don't think it would make sense (or be very practical from a
maintenance pov) to support two implementations concurrently or to have a
factory.

Does it mean that the UIMA-C++ implementation is going to be discontinued
officially?


No, just to clarify no agreements or plans have been made. I just wanted to
initiate a discussion around this as a possible idea.

If we were to pursue this alternate implementation, I don't know ofany reason

why the C++ impl would be discontinued. I had just listed C++ AEs as one of
the things which don't yet work with my current prototype.

Having to recompile the JCas classes is a bit of a blocker to me - but I
remember that Marshall was contemplating about a way to generate JCas
classes at runtime, so this might just be a temporary blocker.

When I say recompile, I don't mean regenerate using JCasGen, justrecompile

.class files from the existing jcas .java files. I would expect that you
would typically only be using one version (other than for comparison
purposes - to validate functional equivalence and/or compare performance),
and so this isn't something that would need to be done often.


Compiled JCas classes tend to be shipped as part of frameworks. This means

that it will not be possible to switch to a new CAS impl just byreplacing a

JAR. It will also mean that components from different UIMA-based frameworks

cannot be mixed and matched anymore unless some broker likeUIMA-AS is used.


The current JCas cover class format is quite old and tightly-coupled to the
heap-based CAS internals. Saying that all new versions of UIMA must be
binary-compatible with these therefore imposes a (somewhat crippling)

restriction on possible internal improvements. You might say thatthe current

JCas classes break standard abstraction/encapsulation principles if the
expectation is they will be forever forwards binary-compatible.

It would not be hard on the UIMA side to move to a simpler and more abstract
JCas cover class format that should avoid this problem in future, but the
actual move to such a format would be even more disruptive than requiring a

recompilation (would require a re-JCasGen), and would have the sameissues you

mention above.

I managed to make this object-based impl at least source-compatible with
existing jcas cover classes, by 'converting' the impl of methods called that
were intended to make CAS heap changes to actually be manipulating the FS
objects directly.

In one context, we also rely heavily on CAS addresses serving as unique
identifiers of feature structures in the CAS. Does your implementation
provide any stable feature structure IDs, preferably ones thatare part of
the system and not actually declared as features?
Yes, there are various cases where an 'equivalent' of an FS address is
required (for example if the LL API is being used). In this casethe id gets
allocated on the fly and will subsequently be unique to that FS within the
CAS. In many cases an FS might never have such an ID allocated (it's not
really part of the non-LL "public" APIs), but you can always'request' one.
I imagine that IDs would be necessary to implement stuff likedelta-CAS later
on too.

Are any of the changes so far in any way related to potentially allowing
additions to the type system at runtime?

Not directly related; my goal was just to make the implementationfunctionally

equivalent but threadsafe (and simpler, faster).
But it's possible (not certain) this new impl may impose fewer barriers to
enabling such capability.

What would be the incentive/benefit for the developer of a UIMA-based
framework/applications or for the users of such frameworks/applications to
switch to the new implementation?


That was the "summary of advantages" I had in the original email, I've
included it again below. The primary "external" benefits I think are the CAS
being thread-safe and faster to manipulate. I understand that many
users/developers might not care about these things, just as they likely

wouldn't care about the code footprint or complexity of theinternals, but italso shouldn't adversely impact them to "upgrade" to a new UIMAversion based

on this implementation.

I feel that not being able to have more than one thread work on a CAS at the
same time is a major limitation, especially given modern systems typically
have many CPU cores.

- Drastic simplification of code - most proprietary data structure impls
removed, many other classes removed, index/index repo impls are about 25% of
the size of the heap versions (good for future enhancements/maintainability)
- Thread safety - multiple logically independent annotators can work on the
same CAS concurrently - reading, writing and iterating over feature
structures. Opens up a lot of parallelism possibilities

- No need for heap resizing or wasted space in fixed size CASbacking arrays,

no large up-front memory cost for CASes - pooling them should no longer be
necessary
- Unlike the current heap impl, when a FS is removed from CAS indices it's
space is actually freed (can be GC'd)
- Unification of CAS and JCas - cover class instance (if it exists) "is" the
feature structure
- Significantly better performance (speed) for many use-cases, especially
where there is heavy access of CAS data

- Usage of standard Java data structure classes means it canbenefit more "for

free" from ongoing improvements in the java SDK and from hardware
optimizations targeted at these classes


Cheers,

-- Richard

Re: Alternate CAS implementation

Reply via email to