kotman12 commented on PR #2382:
URL: https://github.com/apache/solr/pull/2382#issuecomment-2078142747
> So I was trying to learn how the main configuration bits fit together here
and high-level the reverse search idea and my _solr-monitor-naive-dinner-demo_
branch (or #2421 diff) off this pull request's branch is a side effect of that
and my understanding so far based on it is that:
>
> * the in-memory state is in the `Presearcher` object in the
`ReverseQueryParserPlugin` class object (and in the
_solr-monitor-naive-dinner-demo_ i just used a simple `Monitor` object instead
of the `Presearcher` object)
> * the state is updated via the `MonitorUpdateRequestProcessor` i.e. saved
searches are added as `MonitorQuery` objects to the `Monitor` object (and
updating of the `Presearcher` object is a bit different)
> * the state is accessed via the `ReverseSearchComponent` component
(currently non-distributed but conceptually distributed would work too?)
>
> Is that basic understanding correct? As a next step I might go learn more
about the `Presearcher` itself.
I'll give the PR a look but when I first looked at this my main concerns
wiring a Monitor straight into solr were:
1. Handling commit/rollback and what to update the tlog with if you also
writing to a "sidecar" monitor object?
2. Handling persistence. Currently the Monitor has its own tightly sealed
index. It can be configured for persistence but if you want to peek at the
segments a monitor is writing to disk it might not be easy, especially to
handle configurations like tlog+pull. The alternative is to use only the
in-memory Monitor configurations but that has limitations and takes away
precious resources from the {cacheId -> deserialized query} cache.
3. Bringing me to my final point that the cache a Monitor object wraps is a
simple concurrentHashMap which is updated with a very coarse-grained lock that
can block reads for a long time. It just doesn't feel like it "jives" with the
solr approach to concurrency that is much more sophisticated (it is a fully
fledged db after all). We could make the Monitor cache more configurable in the
upstream lucene monitor repo but in my opinion lucene monitor tries to do too
much state-management that its not that good at but the most valuable thing to
take advantage of is the sophisticated reverse search methods (query
decomposition for faster matching, query tokenization for pre-search, term
weighting, optimized document-to-query conversion with term-acceptor, and
probably something else I am forgetting).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]