gsmiller commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1150439971

   Let me try to summarize my understanding of the "future optimization" debate 
as it pertains to this proposed API/implementation and see if we're on the same 
page or if I'm overlooking something.
   
   The current proposal encapsulates/delegates matching logic behind 
`FacetSetMatcher`, which is responsible for determining whether-or-not that 
`FSM` instance matches a provided point. There's `RangeFSM` which knows how to 
match based on whether-or-not the point is contained in the n-dim range, and 
there's `ExactFSM` which is just doing exact point equivalence. The "protocol" 
for doing facet aggregation/counting is implemented inside 
`MatchingFacetSetsCounts`, which delegates to the `FSM#matches` method. The 
inefficiency is that—because `MatchingFacetSetsCounts` can make no assumptions 
about the `FSM` instances provided to it, it must iterate _all_ provided `FSM` 
instances for every indexed point for every provided hit.
   
   For the single point case (`ExactFSM`), this is crazy right? Even if there 
are a small number of provided `ExactFSM` instances we're matching against, 
doing a linear scan of all of them for every point is pretty dang inefficient. 
Especially so for a case where there are many provided hits with many indexed 
points for each. I think the same logic generally holds true for the range case 
as well, but maybe that's more debatable.
   
   But, the problem is, `MatchingFacetSetsCounts` doesn't know anything about 
these `FSM` instances it gets and can't do anything other than just ask them 
all, "hey, do you match this point?" And so the debate seems to be how to setup 
the API to allow for future optimizations, or if we should even worry about it 
at all.
   
   I personally think we should design with this optimization in mind, but I 
think we're close and I don't actually think the current proposal needs to 
really change to allow for future optimizations.
   
   This is where I get a little fuzzy on the conversation that's taken place as 
I haven't totally caught up on the various proposals, and conversations taking 
place. But, if we kept the implementation as it currently exists, in the 
future, if we want to put these optimizations in place, could we not just add a 
method to `FSM` that exposes the min/max values for each dimension? Then, 
`MatchingFacetSetsCounts` could inspect the underlying "hyperrectangles" 
represented by each `FSM` by looking at these min/max values before it counts 
and decide how it wants to proceed. The `matches` method is potentially still 
useful/needed depending on how flexible we want this thing to be; if a user 
creates an `FSM` implementation that's more complicated than just a 
"hyperrectangle" (e.g., some complicated geometry), the contract for providing 
min/max could be that the provided min/max is a bounding box, but `matches` 
still needs to be called to actually confirm the match.
   
   I point this out not to propose implementing it now, but to say that I think 
we have options to extend what is currently proposed here if/when we want to 
optimize. Does this make sense or am I talking about a completely different 
problem or missing key parts of the conversation that's happened? Apologies if 
I have.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to