The problem here is mainly with the Sorted*DocValues APIs, which return a 
TermsEnum but don’t have a Terms instance to call intersect on.  So maybe the 
thing to do is to add a termsEnum(CompiledAutomaton) method to SortedDocValue 
and SortedSetDocValues?  That should avoid the trap of bypassing 
Terms.intersect() when it’s available.  I’ll try and work up a patch.

Alan Woodward
www.flax.co.uk


> On 6 Jan 2017, at 21:44, Michael McCandless <luc...@mikemccandless.com> wrote:
> 
> Unfortunately I think that's somewhat dangerous because it creates an
> ambiguous API with a nasty performance trap?
> 
> I.e. this new method won't invoke the fast Terms.intersect in the
> default terms dict?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Fri, Jan 6, 2017 at 3:20 PM, Alan Woodward <a...@flax.co.uk> wrote:
>> Hm, how about something like this, on CompiledAutomaton:
>> 
>> public TermsEnum getTermsEnum(TermsEnum te) throws IOException {
>>  switch (type) {
>>    case NONE:
>>      return TermsEnum.EMPTY;
>>    case ALL:
>>      return te;
>>    case SINGLE:
>>      return new SingleTermsEnum(te, term);
>>    case NORMAL:
>>      return new AutomatonTermsEnum(te, this);
>>    default:
>>      // unreachable
>>      throw new RuntimeException("unhandled case");
>>  }
>> }
>> 
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 6 Jan 2017, at 19:16, Michael McCandless <luc...@mikemccandless.com>
>> wrote:
>> 
>> These automaton intersection APIs are frustrating with all the special
>> case handling... Ideas welcome!
>> 
>> We've had similar challenges with them in the past, when a user
>> invoked Terms.intersect directly instead of via CompiledAutomaton:
>> https://issues.apache.org/jira/browse/LUCENE-7576
>> 
>> The problem is CompiledAutomaton specializes certain cases (all
>> strings match, no strings match, single term) and sidesteps
>> Terms.intersect for those cases.
>> 
>> We should fix AutomatonTermsEnum public ctor w/ the same checks
>> (insist on a NORMAL case) so you don't hit assert failures, or, worse
>> ... I'll do that.
>> 
>> I think a new CompiledAutomaton.intersect taking TermsEnum would be
>> tricky in general because it relies on the (efficient) Terms.intersect
>> to handle the NORMAL case well, but we can't invoke that from a
>> TermsEnum.
>> 
>> In the SINGLE case, could you use SingleTermsEnum, passing the
>> TermsEnum from your doc values, and the term from the
>> CompiledAutomaton?  Would that suffice as a workaround?
>> 
>> Mike McCandless
>> 
>> http://blog.mikemccandless.com
>> 
>> On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward <a...@flax.co.uk> wrote:
>> 
>> We’ve hit an issue while developing marple, where we want to have the
>> ability to filter the values from a SortedDocValues terms dictionary.
>> Normally you’d create a CompiledAutomaton from the filter string, and then
>> call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms
>> instance, we instead have a TermsEnum.
>> 
>> Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but
>> if the CompiledAutomaton in question is a fixed string, then we get
>> assertion failures, because ATE uses the  compiled automaton’s internal
>> ByteRunAutomaton for filtering, and fixed-string automata don’t have one.
>> 
>> Is there a work-around that I’m missing here?  Or should I maybe open a JIRA
>> to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

Reply via email to