[jira] [Commented] (UIMA-1524) JFSIndexRepository should be enhanced with new generic methods

Richard Eckart de Castilho (JIRA) Sat, 17 Sep 2016 11:09:01 -0700

    [ 
https://issues.apache.org/jira/browse/UIMA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499438#comment-15499438
 ]


Richard Eckart de Castilho commented on UIMA-1524:
--------------------------------------------------

I think we are experimenting here a bet in the range between positional-style, 
builder-style (declarative), and functional style (imperative) and kind of 
fluctuating with respect to preferences towards them.

Consider the below as my thinking out loudly as I try to follow Marshalls 
thoughts as rendered in the wiki.

Maybe we need to pick the kinds of statements that we can build with this new 
API apart a bit. Marshall actually did that quite nicely in the Gliffy diagram 
in the wiki. So let's try an example...

{code}
cas.select()
    .type(Token.class)
    .within(sentence)
{code}

The above looks like builder code for an iterator or collection, but it lacks a 
terminal statement like: asList(), stream(), iterator(), etc. (all of which the 
Gliffy provides). My understanding of the Gliffy is that Marshall imagines that 
this builder-like API is not only a builder but at the same time implements the 
Java stream API. That means the builder does not have to be terminated 
explicitly but can be terminated at any time simply by calling any of the 
Stream API methods. But for sake of clarity, I'll just add a terminal builder 
step now (fsIterator()).

{code}
cas.select()
    .type(Token.class) // result type of fsIterator
    .within(sentence) // location condition
    .fsIterator() // result 
{code}

Now I would argue that any such statement can only have one result type, so 
only one *type* call in the builder. So a statement like the following would 
*not* make too much sense:

{code}
cas.select()
    .type(Token.class) // result type of fsIterator
    .type(Lemma.class) // whoops?
    .within(sentence) // location condition
    .fsIterator() // result 
{code}

So it seems to be quite sensible and economic to drop the *type* builder call 
and conflate it into the *select* call:

{code}
cas.select(Token.class) // result type of fsIterator
    .within(sentence) // location condition
    .fsIterator() // result 
{code}

*Decision requirement:* Should we entirely drop the *type* call? Should we 
throw an exception if it is called twice?

There are multiple ways that users want to specify types, e.g. as class, type, 
string, or even nothing (i.e. not making a type restriction):

{code}
select(Token.class)
select(Token.type)
select("my.type.Token")
select()
{code}

As for location conditions (covered, covering, following, preceding, relative, 
between, at, ...), there are some cases where multiple conditions *could* be 
sensible. Note that I include "at" in the location conditions here where the 
Gliffy in the wiki seems currently to consider "at" to have a different quality 
from e..g "covering" or "following".

{code}
cas.select(Token.class) // result type of fsIterator
    .within(sentence) // location condition 1
    .following(predicateVerb) // location condition 2
    .fsIterator() // result 
{code}

The ability to configure some additional behaviors for the builder are 
sensible, e.g.:

{code}
cas.select(Token.class) // result type of fsIterator
    .within(sentence) // location condition 1
    .following(predicateVerb) // location condition 2
    .typePriorities()
    .strict()
    .fsIterator() // result 
{code}

However, if we allow multiple conditions, then the question is whether the 
behaviors should apply to the whole builder only locally to individual 
conditions.

*Decision requirement:* We need to decide whether we want to allow multiple 
location conditions (as above) or not. If not, should we throw an exception if 
it is called twice?

I tend towards liking the idea of multiple location conditions (although not 
all combinations are sensible) if that is not too hard to implement. The code 
for the different select methods in uimaFIT is very tightly tuned to particular 
location conditions and I am unsure how straightforward it would be to 
dynamically combine them.

Normally, results are delivered in index order. It appears as if the reverse() 
behavior is simply changing that to go in reverse-index order. I.e. it is a 
declarative reverse for which there is also a signature that includes a boolean 
parameter:

{code}
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb)
    .reverse(true)
    .typePriorities()
    .strict()
    .fsIterator() // result 
{code}

Each location condition could be augmented by secondary conditions, e.g. a 
"displacement" (which Marshall calls offset). E.g. here we retrieve all Tokens 
following the Token 3 positions right of the predicateVerb token.

{code}
Token predicateVerb = ...
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb, 3)
    .fsIterator() // result 
{code}

The case above could also be simulated without the displacement, e.g.

{code}
Token predicateVerb = ...
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb, 3)
    .stream()
    .skip(3) // result 
{code}

... but that mightwork always. E.g. here we retrieve all Tokens following the 
Verb 3 positions right of the predicateVerb Verb. So here the offset does not 
apply to the Token index but to the Verb index.

{code}
Verb predicateVerb = ...
cas.select(Token.class) // result type of fsIterator
    .following(predicateVerb, 3)
    .fsIterator() // result 
{code}

But I am actually unsure as to what the semantics of the displacement are.

*Decision requirement:* When an displacement is specified in a location 
condition, does it operate on the index of the selected type (here Token) or on 
the index of the condition type (here Verb)?

Another afterthought on the exercise: the stream API does not work with 
enhanced for loop. If the builder implements its builder API + the stream API, 
then it would be nice if it could also implement the iterable API:

{code}
for (Token t : cas.select(Token.class).following(predicateVerb)) {
  // do something...
}
{code}

Omitted here are thoughts on index() and limit() which are included in the wiki 
description and seem to fit in nicely with the builder API. Some aspects like 
unordered, nonoverlapping, I did not consider yet.

> JFSIndexRepository should be enhanced with new generic methods
> --------------------------------------------------------------
>
>                 Key: UIMA-1524
>                 URL: https://issues.apache.org/jira/browse/UIMA-1524
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3
>            Reporter: Joern Kottmann
>
> Existing methods should be overloaded with an additional Class argument to 
> specify the exact return type. This changes make down casting of returned 
> objects unnecessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (UIMA-1524) JFSIndexRepository should be enhanced with new generic methods

Reply via email to