Re: Inventory updates via join query and caches

Mikhail Khludnev Tue, 07 Jun 2022 13:54:40 -0700

ok. It takes a while to scratch some code with a test
https://issues.apache.org/jira/browse/SOLR-16242
https://github.com/apache/solr/pull/623 Please chime it!


On Tue, Feb 15, 2022 at 11:01 AM Mikhail Khludnev <[email protected]> wrote:

> It turned out to be a little bit more optimistic after I moved the cache
> check into QueryWrapper.createWeight(Searcher, ...,... ). TBC.
>
> Joel,
> Regarding moving inventory into the main index, I'm afraid it requires
> frequent commits into the main index and impacts search latency.
>
> On Mon, Feb 14, 2022 at 12:45 AM Mikhail Khludnev <[email protected]> wrote:
>
>> Hi, David and Joel.
>> It took a while. I kicked tires a little
>> https://github.com/apache/solr/pull/623
>> I introduced {!join cacheEventually=true} param. It yields false positive
>> JoinQueries (ignores fromCore timestamp), and backed on docsets reside in
>> the user cache.
>> Cache listener doesn't suit for this purpose - fresh "from" searcher
>> isn't available for refreshing queries. So, I made it work with special
>> update processor which registered at inventory ("from") core and refreshes
>> user cache of "to" searcher with regenerator.warm.
>> You know, it's even work passing a simple test.
>> Here's the bummer q=*:*&fq={!join cacheEventually=true fromCore=inventory
>> ..}.. if it's cached in query result cache, and commit into main index
>> starts to warm query result cache with a new "to" searcher, and it picks up
>> old searcher doc set. Boom. Presumably it can worked around by
>> q={!cache=false}... or disabling query result cache, but it seems not so
>> elegant, as I thought.
>>
>> On Mon, Dec 20, 2021 at 4:09 AM Joel Bernstein <[email protected]>
>> wrote:
>>
>>> The second approach (newSearcher listener) is a nice approach if the
>>> filter cache is too full to rely on auto-warming.
>>> Static warming queries fail on cross core joins but I believe succeed on
>>> a self core join. So you could move the inventory into the same core and
>>> use a static warming query. The downside to this is the pollution of the
>>> main index with ever changing inventory segments.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>>
>>> On Sun, Dec 19, 2021 at 6:10 PM David Smiley <[email protected]> wrote:
>>>
>>>> I'm not sure there is a clean/simple solution to this specific
>>>> problem.  But I could imagine a more general & simple feature that could
>>>> solve this scenario, with just a bit more work by the user.
>>>>
>>>> Imagine an optional cache-key on ExtendedQuery auto-parsed, perhaps
>>>> with local-param "cacheKey".  It would wrap any Query with one having a
>>>> special equals & hashcode on this key.  Solr wouldn't parse the string for
>>>> this query so long as it can look it up in a special cache of these.  That
>>>> special cache would be Map<String,Query> with weak values such that if it's
>>>> not used anymore (e.g. not in the filter cache), it would be GC'ed.  This
>>>> would be useful for expensive queries that might resolve from some
>>>> network location (e.g. access control filters that refer to data in
>>>> who-knows-where).  So that's useful on its own but doesn't solve your
>>>> conundrum.  Then, imagine some new request handler that allows you to
>>>> provide this key & query and have it perform a filter cache save,
>>>> overwriting whatever entry that may have been there.  You could even do
>>>> this in a newSearcher event on the inventory core, calling into the primary
>>>> product core.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Tue, Dec 14, 2021 at 4:24 PM Mikhail Khludnev <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello, Colleagues.
>>>>> I want to discuss one frequent usecase: inventory updates.
>>>>> Let's say we can't reindex docs when inventory numbers updated. We can
>>>>> put inventory in separate index, and apply fq={!join ..
>>>>> fromIndex=inventory}left:(0 TO *]. Once it's cached in main index filter
>>>>> cache it gets a good response time. We can even shard main collection, but
>>>>> keep inventory single shard. Ok.
>>>>> The sad moment occurs when commit goes into inventory core, after
>>>>> searcher is refreshed it's going to be cache misses on those inventory
>>>>> queries, and many of them go into new inventory searcher. That's not good.
>>>>> I can think of two workarounds:
>>>>>  - relax {!join} equality regarding fromIndex timestamp, so for some
>>>>> time it will be outdated inventory, but it's ok. And then we need to
>>>>> somehow, evict, invalidate, regenerate inventory filter
>>>>>  - newSearcher listener in inventory core can introspect main core
>>>>> cache entries find {!join .. fromIndex=inventory}... regenerate and insert
>>>>> results.
>>>>> I'm afraid to think about queryResult cache.
>>>>>
>>>>> Is it worth to have something like this in Solr distro?
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>>
>>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Inventory updates via join query and caches

Reply via email to