Marshall Schor commented on UIMA-5115:

Richard made many useful comments to the first cut of the select documentation 
(pdf) using the Adobe commenting tool.  I'm bring some of those into the Jira 
(others are good suggestions that I'll incorporate in next revision).

# Defaults:
#* change default for AnnotationIndex processing to non-overlapped 
(unambiguous); I'm not sure about this.  I agree that in most use cases, the 
situation will be that there are no overlapping annotations (imagining Sentence 
with non-overlapping Tokens).  But if a pipeline did produce some overlapping 
Tokens, this default would "silently" skip those.  I think this action should 
not be so "silent", to lessen the chance of mistakes in assumptions made by 
downstream users of upstream annotators.
#* endWithinBounds - I agree with the comment, and in fact, the default (not 
clearly expressed) was changed in the code to be as suggested.  I'm thinking of 
a rename like "includeEndBeyondBounds"; I suspect it will get very little (if 
any) use so the long name won't be significant.
#* skipEquals - this is poorly documented.  The implementation **never** 
includes the "bound", because both the Subiterator and uimaFIT implementations 
never included the "bound"; it was not the intent of this to sometimes include 
the bound.  So, it needs to be renamed.
#** These two implementations differed in what they meant by the "bound", 
however.  In uimaFIT,  the Feature Structure to be skipped was the one which 
was exactly == (had the same "id") as the bound Feature Structure.  In 
Subiterator, the ones that were skipped were the ones which compared as "equal" 
using the annotation index's comparator function (which used type priority).   
What this boolean switch was trying to do was to allow specifying which of 
these two equal meanings was to be used in doing the skipping.  Note that this 
is a detail that only applies when there are potentially multiple Annotations 
which compare equal.  
# General approach to handling ignored or not-applicable settings: I am 
slightly favoring some kind of notification, if they are indicative of a likely 
error or misunderstanding by the Annotator writer; this has to be balanced with 
making this framework "annoying" to the user.  Kinds of notification include 
throwing exceptions, or (decreasing frequency) logging of warnings.
# re: renaming Processing Actions: I never liked the term much...  I'm ok with 
terminal actions, result forms, but my choice would be the combo: "terminal 
# re: renaming the select framework to the CAS Query framework - I think this 
ties too closely to the CAS as the data source, given that other collections 
can be the source.  We could call it the Feature Structure Query framework, but 
that seems too verbose, compared to the "select" framework, so I'd prefer to 
keep "select".
# re: ordering and sorted-ordering.  I'll make a pass to clarify the subcases.  
The general approach is that sort ordering for Annotation Indexes is usually 
implied (but can be (partially) undone using the unordered() builder, if 
desired for efficiency).

> uv3 select() api for iterators and streams over CAS contents
> ------------------------------------------------------------
>                 Key: UIMA-5115
>                 URL: https://issues.apache.org/jira/browse/UIMA-5115
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDKexp
> Design and implement a select() API based on uimaFIT's select, integrated 
> well with Java 8 concepts.  Initial discussions in UIMA-1524.  Wiki with 
> diagram: https://cwiki.apache.org/confluence/display/UIMA/UV3+Iterator+support

This message was sent by Atlassian JIRA

Reply via email to