On 06/23/2015 05:11 PM, Marshall Schor wrote:
I added a wiki page to develop the ideas here.
This is what I got from reading this:
One idea is having an annotator not have a type system specification, but rather
have it dynamically create types / features according to some configuration info
or some dynamically-obtained information (perhaps the results of some previous
analysis).
Another idea is having an annotator be able to read Feature Structure data from
a wide variety of sources, and have the data include the type/feature metadata
(either externally - as we do now in UIMA with a type system external XML
specification, or embedded - like JSON would naturally do). Such an annotator
would have some notion of the type / feature information it was interested in
processing, but could ignore the rest.
Finally, a third idea is to have the componentization be such that no "UIMA
Framework" was needed, or if present, it's hidden. I'm thinking that this
means, for simpler analytics, the idea of a pipe line, and combining things,
would not be present; it would be more like just a single annotator. For more
complex things, the idea of a pipeline would be encapsulated (like UIMA's
Aggregates), and the whole thing would look like something that could be
embedded, in any of the other "big data" frameworks as an analysis piece. The
implication is that this would enable using other frameworks' scaleout
mechanisms.
Does this capture the ideas? Please fill what I may have missed :-)
-Marshall
That captures it pretty well.
--Thilo