[ 
https://issues.apache.org/jira/browse/LUCENE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233159#comment-14233159
 ] 

Jason Gerlowski commented on LUCENE-6083:
-----------------------------------------

Hi Paul.

Looks like a neat addition.  I have a few comments/suggestions.  Please take 
with a grain of salt.

1) There's some logic in ContainSpans that assigns one of the provided Spans to 
a 'sourceSpans' instance variable.  Calls to ContainSpans.start(), .end(), 
.doc(), etc. then just call the associated method on 'sourceSpans'.  This looks 
like it might be a good use case for FilterSpans 
(https://issues.apache.org/jira/browse/LUCENE-5933), which *I think* is 
available in trunk and 4x.  Using FilterSpans might help get rid of some of the 
boilerplate here.

2) It would be nice if there was a way to customize/change what it means for 
one Span to be "contained" in another.  Personally, I agree with the definition 
of "containing" laid out in ContainSpans.toContained(), but I can definitely 
see how other people might have different ideas of what being "contained" means.

Different potential semantics include:
  a) the current behavior: (container.start <= contained.start && contained.end 
<= container.end)
  b) no sharing start/end points: (container.start < contained.start && 
contained.end < container.end)
  c) spans can be partially-contained/overlapping: ((contained.start > 
container.start && contained.start < container.end) || (contained.end > 
container.start && contained.end < container.end))

I think it'd be a good idea to allow support for any of the semantics above.  
Letting people choose (or even define) the "containing" semantics they want 
would make your patch even more flexible/powerful!

(a) and (b) above could probably be supported by just adding a boolean option 
to the constructor for *Query.  If being even more flexible makes sense, (a), 
(b), and (c) could probably all be supported by moving the 
ContainSpans.toContained() logic into a separate class that can get passed into 
*Query as a constructor arg.  This would allow arbitrary "contained" semantics, 
as people can subclass the ContainedSpanMatchFinder (or whatever it would be 
called.).

Thoughts?

> Span containing/contained queries
> ---------------------------------
>
>                 Key: LUCENE-6083
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6083
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: LUCENE-6083.patch
>
>
> SpanContainingQuery reducing a spans to where it is containing another spans.
> SpanContainedQuery reducing a spans to where it is contained in another spans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to