Unfortunately I think that's somewhat dangerous because it creates an ambiguous API with a nasty performance trap?
I.e. this new method won't invoke the fast Terms.intersect in the default terms dict? Mike McCandless http://blog.mikemccandless.com On Fri, Jan 6, 2017 at 3:20 PM, Alan Woodward <a...@flax.co.uk> wrote: > Hm, how about something like this, on CompiledAutomaton: > > public TermsEnum getTermsEnum(TermsEnum te) throws IOException { > switch (type) { > case NONE: > return TermsEnum.EMPTY; > case ALL: > return te; > case SINGLE: > return new SingleTermsEnum(te, term); > case NORMAL: > return new AutomatonTermsEnum(te, this); > default: > // unreachable > throw new RuntimeException("unhandled case"); > } > } > > > Alan Woodward > www.flax.co.uk > > > On 6 Jan 2017, at 19:16, Michael McCandless <luc...@mikemccandless.com> > wrote: > > These automaton intersection APIs are frustrating with all the special > case handling... Ideas welcome! > > We've had similar challenges with them in the past, when a user > invoked Terms.intersect directly instead of via CompiledAutomaton: > https://issues.apache.org/jira/browse/LUCENE-7576 > > The problem is CompiledAutomaton specializes certain cases (all > strings match, no strings match, single term) and sidesteps > Terms.intersect for those cases. > > We should fix AutomatonTermsEnum public ctor w/ the same checks > (insist on a NORMAL case) so you don't hit assert failures, or, worse > ... I'll do that. > > I think a new CompiledAutomaton.intersect taking TermsEnum would be > tricky in general because it relies on the (efficient) Terms.intersect > to handle the NORMAL case well, but we can't invoke that from a > TermsEnum. > > In the SINGLE case, could you use SingleTermsEnum, passing the > TermsEnum from your doc values, and the term from the > CompiledAutomaton? Would that suffice as a workaround? > > Mike McCandless > > http://blog.mikemccandless.com > > On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward <a...@flax.co.uk> wrote: > > We’ve hit an issue while developing marple, where we want to have the > ability to filter the values from a SortedDocValues terms dictionary. > Normally you’d create a CompiledAutomaton from the filter string, and then > call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms > instance, we instead have a TermsEnum. > > Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but > if the CompiledAutomaton in question is a fixed string, then we get > assertion failures, because ATE uses the compiled automaton’s internal > ByteRunAutomaton for filtering, and fixed-string automata don’t have one. > > Is there a work-around that I’m missing here? Or should I maybe open a JIRA > to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton? > > Alan Woodward > www.flax.co.uk > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org