[jira] [Commented] (SOLR-9658) Caches should have an optional way to clean if idle for 'x' mins

Hoss Man (JIRA) Fri, 09 Aug 2019 14:26:12 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904200#comment-16904200
 ]


Hoss Man commented on SOLR-9658:
--------------------------------

* i should have noticed/mentioned this in the last patch but: any method 
(including your new {{markAndSweepByIdleTime()}} that that expects to be called 
only when markAndSweepLock is already held should really start with {{assert 
markAndSweepLock.isHeldByCurrentThread();}}

 * this patch still seems to modify TestJavaBinCodec unneccessarily? (now that 
you re-added the backcompat constructor)

 * i don't really think it's a good idea to add these {{CacheListener}} / 
{{EvictionListener}} APIs at this point w/o a lot more consideration of their 
lifecycle / usage
 ** I know you introduced them in response to my suggestion to add hooks for 
monitoring in tests, but they don't _currently_ seem more useful in the tests 
then some of the specific suggestions i made before (more comments on this 
below) and the APIs don't seem to be thought through enough to be generally 
useful later w/o a lot of re-working...
 *** Examples: if the point of creating {{CacheListener}} now is to be able to 
add more methods/hooks to it later, then is why is only {{EvictionListener}} 
passed down to the {{ConcurrentXXXCache}} impls instead of the entire 
{{CacheListener}} ?
 *** And why are there 2 distinct {{EvictionListener}} interfaces, instead of 
just a common one?
 ** ... so it would probably be safer/cleaner to avoid adding these APIs now 
since there are simpler alternatives available for the tests?

 * Re: "...plus adding support for artificially "advancing" the time" ... this 
seems overly complex?
 ** None of the suggestions i made for improving the reliability/coverage of 
the test require needing to fake the "now" clock: just being able to insert 
synthetic entries into the cache with artifically old timestamps – which could 
be done by refactoring out the middle of {{put(...)}} into a new 
{{putCacheEntry(CacheEntry ... )}} method that would let the (test) caller set 
an arbitrary {{lastAccessed}} value...
{code:java}
/**
 * Useable by tests to create synthetic cache entries, also called by {@link 
#put}
 * @lucene.internal
 */
 public CacheEntry<K,V> putCacheEntry(CacheEntry<K,V> e) {
    CacheEntry<K,V> oldCacheEntry = map.put(key, e);
    int currentSize;
    if (oldCacheEntry == null) {
      currentSize = stats.size.incrementAndGet();
      ramBytes.addAndGet(e.ramBytesUsed() + HASHTABLE_RAM_BYTES_PER_ENTRY); // 
added key + value + entry
    } else {
      currentSize = stats.size.get();
      ramBytes.addAndGet(-oldCacheEntry.ramBytesUsed());
      ramBytes.addAndGet(e.ramBytesUsed());
    }
    if (islive) {
      stats.putCounter.increment();
    } else {
      stats.nonLivePutCounter.increment();
    }
    return oldCacheEntry;
 }
{code}

 ** ...that way tests could "setup" a cache containing arbitrary entries (of 
arbitrary size, with arbitrary create/access times that could be from weeks in 
the past) and then very precisely inspect the results of the cache after 
calling {{markAndSweep()}}
 *** or some other new {{triggerCleanupIfNeeded()}} method that can encapsualte 
all of the existing {{// Check if we need to clear out old entries from the 
cache ...}} logic currently at the end of {{put()}}

 * In general, i really think testing of functionality like this should really 
focus on testing "what exactly happens when markAndSeep() is called on a cache 
containing a very specific set of values?" indepdnent from "does markAndSweep() 
get called eventually & automatically if i configure maxIdleTime?"
 ** the former can be tested w/o the need of any cleanup threads or faking the 
TimeSource
 ** the later can be tested w/o the need of a {{CacheListener}} or 
{{EvictionListener}} API (or a fake TimeSource) – just create an anonymous 
subclass of {{ConcurrentXXXCache}} who'se markAndSweep() method decrements a 
CountDownLatch tha the test tread is waiting on
 ** isolating the testing of these different concepts not only makes it easier 
to test more complex aspects of how {{markAndSweep()}} is expected to work (ie: 
"assert exactly which entries are removed if the sum of the sizes == X == 
(ramUpperWatermark + Y) but the two smallest entries (whose total size = Y + 1) 
are the only one with an accessTime older then the idleTime") but also makes it 
easier to understand & debug failures down the road -if- _when_ they happen.
 *** as things stand in your patch, -if- _when_ the "did not evict entries in 
time" assert (eventually) trips in a future jenkins build, we won't immediately 
be able to tell (w/o added logging) if that's because of a bug in the 
{{CleanupThread}} that prevented if from calling {{markAndSweep();}} or a bug 
in {{SimTimeSource.advanceMs()}} ; or a bug somewhere in the cache that 
prevented {{markAndSweep()}} from recognizing those entries were old; or just a 
heavily loaded VM CPU that never got around to scheduling the {{CleanupThread}} 
after it's {{await(maxIdle)}} returned. (that last one is hard to definitively 
rule out either way, but in a test where we set maxIdleTime="1 second", and use 
latch.await("2 minutes"), we could rule it out with a high degree of confidence 
– but a 1 second max idle time makes it hard to reliably test anything _else_ 
interesting on a machine that slow about what items are evicted, which is where 
isolating the concepts being tested is really helpful)

> Caches should have an optional way to clean if idle for 'x' mins
> ----------------------------------------------------------------
>
>                 Key: SOLR-9658
>                 URL: https://issues.apache.org/jira/browse/SOLR-9658
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch, 
> SOLR-9658.patch
>
>
> If a cache is idle for long, it consumes precious memory. It should be 
> configurable to clear the cache if it was not accessed for 'x' secs. The 
> cache configuration can have an extra config {{maxIdleTime}} . if we wish it 
> to the cleaned after 10 mins of inactivity set it to {{maxIdleTime=600}}. 
> [~dragonsinth] would it be a solution for the memory leak you mentioned?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9658) Caches should have an optional way to clean if idle for 'x' mins

Reply via email to