These automaton intersection APIs are frustrating with all the special
case handling... Ideas welcome!

We've had similar challenges with them in the past, when a user
invoked Terms.intersect directly instead of via CompiledAutomaton:
https://issues.apache.org/jira/browse/LUCENE-7576

The problem is CompiledAutomaton specializes certain cases (all
strings match, no strings match, single term) and sidesteps
Terms.intersect for those cases.

We should fix AutomatonTermsEnum public ctor w/ the same checks
(insist on a NORMAL case) so you don't hit assert failures, or, worse
... I'll do that.

I think a new CompiledAutomaton.intersect taking TermsEnum would be
tricky in general because it relies on the (efficient) Terms.intersect
to handle the NORMAL case well, but we can't invoke that from a
TermsEnum.

In the SINGLE case, could you use SingleTermsEnum, passing the
TermsEnum from your doc values, and the term from the
CompiledAutomaton?  Would that suffice as a workaround?

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward <a...@flax.co.uk> wrote:
> We’ve hit an issue while developing marple, where we want to have the
> ability to filter the values from a SortedDocValues terms dictionary.
> Normally you’d create a CompiledAutomaton from the filter string, and then
> call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms
> instance, we instead have a TermsEnum.
>
> Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but
> if the CompiledAutomaton in question is a fixed string, then we get
> assertion failures, because ATE uses the  compiled automaton’s internal
> ByteRunAutomaton for filtering, and fixed-string automata don’t have one.
>
> Is there a work-around that I’m missing here?  Or should I maybe open a JIRA
> to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?
>
> Alan Woodward
> www.flax.co.uk
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to