Marshall Schor commented on UIMA-5115:
Richard made many useful comments to the first cut of the select documentation
(pdf) using the Adobe commenting tool. I'm bring some of those into the Jira
(others are good suggestions that I'll incorporate in next revision).
#* change default for AnnotationIndex processing to non-overlapped
(unambiguous); I'm not sure about this. I agree that in most use cases, the
situation will be that there are no overlapping annotations (imagining Sentence
with non-overlapping Tokens). But if a pipeline did produce some overlapping
Tokens, this default would "silently" skip those. I think this action should
not be so "silent", to lessen the chance of mistakes in assumptions made by
downstream users of upstream annotators.
#* endWithinBounds - I agree with the comment, and in fact, the default (not
clearly expressed) was changed in the code to be as suggested. I'm thinking of
a rename like "includeEndBeyondBounds"; I suspect it will get very little (if
any) use so the long name won't be significant.
#* skipEquals - this is poorly documented. The implementation **never**
includes the "bound", because both the Subiterator and uimaFIT implementations
never included the "bound"; it was not the intent of this to sometimes include
the bound. So, it needs to be renamed.
#** These two implementations differed in what they meant by the "bound",
however. In uimaFIT, the Feature Structure to be skipped was the one which
was exactly == (had the same "id") as the bound Feature Structure. In
Subiterator, the ones that were skipped were the ones which compared as "equal"
using the annotation index's comparator function (which used type priority).
What this boolean switch was trying to do was to allow specifying which of
these two equal meanings was to be used in doing the skipping. Note that this
is a detail that only applies when there are potentially multiple Annotations
which compare equal.
# General approach to handling ignored or not-applicable settings: I am
slightly favoring some kind of notification, if they are indicative of a likely
error or misunderstanding by the Annotator writer; this has to be balanced with
making this framework "annoying" to the user. Kinds of notification include
throwing exceptions, or (decreasing frequency) logging of warnings.
# re: renaming Processing Actions: I never liked the term much... I'm ok with
terminal actions, result forms, but my choice would be the combo: "terminal
# re: renaming the select framework to the CAS Query framework - I think this
ties too closely to the CAS as the data source, given that other collections
can be the source. We could call it the Feature Structure Query framework, but
that seems too verbose, compared to the "select" framework, so I'd prefer to
# re: ordering and sorted-ordering. I'll make a pass to clarify the subcases.
The general approach is that sort ordering for Annotation Indexes is usually
implied (but can be (partially) undone using the unordered() builder, if
desired for efficiency).
> uv3 select() api for iterators and streams over CAS contents
> Key: UIMA-5115
> URL: https://issues.apache.org/jira/browse/UIMA-5115
> Project: UIMA
> Issue Type: New Feature
> Components: Core Java Framework
> Reporter: Marshall Schor
> Priority: Minor
> Fix For: 3.0.0SDKexp
> Design and implement a select() API based on uimaFIT's select, integrated
> well with Java 8 concepts. Initial discussions in UIMA-1524. Wiki with
> diagram: https://cwiki.apache.org/confluence/display/UIMA/UV3+Iterator+support
This message was sent by Atlassian JIRA