[
https://issues.apache.org/jira/browse/LUCENE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233159#comment-14233159
]
Jason Gerlowski commented on LUCENE-6083:
-----------------------------------------
Hi Paul.
Looks like a neat addition. I have a few comments/suggestions. Please take
with a grain of salt.
1) There's some logic in ContainSpans that assigns one of the provided Spans to
a 'sourceSpans' instance variable. Calls to ContainSpans.start(), .end(),
.doc(), etc. then just call the associated method on 'sourceSpans'. This looks
like it might be a good use case for FilterSpans
(https://issues.apache.org/jira/browse/LUCENE-5933), which *I think* is
available in trunk and 4x. Using FilterSpans might help get rid of some of the
boilerplate here.
2) It would be nice if there was a way to customize/change what it means for
one Span to be "contained" in another. Personally, I agree with the definition
of "containing" laid out in ContainSpans.toContained(), but I can definitely
see how other people might have different ideas of what being "contained" means.
Different potential semantics include:
a) the current behavior: (container.start <= contained.start && contained.end
<= container.end)
b) no sharing start/end points: (container.start < contained.start &&
contained.end < container.end)
c) spans can be partially-contained/overlapping: ((contained.start >
container.start && contained.start < container.end) || (contained.end >
container.start && contained.end < container.end))
I think it'd be a good idea to allow support for any of the semantics above.
Letting people choose (or even define) the "containing" semantics they want
would make your patch even more flexible/powerful!
(a) and (b) above could probably be supported by just adding a boolean option
to the constructor for *Query. If being even more flexible makes sense, (a),
(b), and (c) could probably all be supported by moving the
ContainSpans.toContained() logic into a separate class that can get passed into
*Query as a constructor arg. This would allow arbitrary "contained" semantics,
as people can subclass the ContainedSpanMatchFinder (or whatever it would be
called.).
Thoughts?
> Span containing/contained queries
> ---------------------------------
>
> Key: LUCENE-6083
> URL: https://issues.apache.org/jira/browse/LUCENE-6083
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: LUCENE-6083.patch
>
>
> SpanContainingQuery reducing a spans to where it is containing another spans.
> SpanContainedQuery reducing a spans to where it is contained in another spans.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]