Re: Using PCIe SSDs instead of RAM

Jakub Łopuszański Thu, 22 Jul 2010 23:43:32 -0700

While I agree with most of your thesis, I can't see how GC is against the
LRU.


I agree, that often accessed keys with short TTL seem strange, and so do
rarely accessed keys with long TTL. But there are lots of perfect reasons to
have such situation, and we do.
GC does not work against the LRU (at least I can't see it), it cooperates.
Apparently LRU is never used, because you have smaller chances to run out of
memory, but I'd like to answer doubts of Brian Moon: in case whole memory is
occupied you will not get "sudden lack of memory", but just the usuall
thing: LRU will start to evict oldest items.
I agree that monitoring hitrates and evictions makes sens, but you can
forcast problems much sooner if you monitor number of unexpired items, as
well.
The point is: GC does not forbid you from using your regular monitoring
tools, skills and procedures. It just gives you another tool: live
monitoring of unexpired items.
I see nothing bad about it:)

Scenario 1. You are releasing new feature, and you want to scale the number
of servers accordingly to the load. You can monitor memory usage as the
users join, extrapolate, and order new machines much sooner, than by
monitoring evictions, as evictions indicate that you already have a problem.
Scenario 2. You need to steal machines from one cluster to help build
another one, and you have to decide if you can do so safely without risking
that the old cluster will "run of memory". Again monitoring evictions can
not reliably tell you how many machines can you remove from the cluster,
while monitoring memory gives you perfectly accurate info.


On Fri, Jul 23, 2010 at 12:12 AM, dormando <[email protected]> wrote:

>
> http://code.google.com/p/memcached/wiki/NewServerMaint#Looks_Can_be_Deceiving
>
> Think I'll write a separate page about managing memory, based off of the
> slides from my mysqlconf presentation about monitoring memcached...
>
> We're not ignoring you, the patch is against what the LRU is designed for.
> Several people have argued to put garbage collection back into memcached,
> but it just doesn't mix.
>
> In the interest of being constructive, you should look back through the
> mailing list for details on the storage engine branch, and if you really
> want it to work, it'd be a good exercise to implement this as a custom
> storage engine.
>
> In the interest of being thorough; you proved your own patch unnecessary
> by noting that the hitrate did not change. It just confirmed you weren't
> having a problem.
>
> The short notes of my slides are just:
>
> - Note evictions over time
> - Note hitrate over time
> - Investigate changes to either via a traffic snapshot from maatkit,
> either on your memcached server or from an app server. Or setup one app
> server to log its memcached traffic. whatever you need to do.
> - Note your DB load as well, and correlate *all* of these numbers.
>
> You'll get way more useful information out of the *flow* through memcached
> than from *what's inside it*. What's inside it doesn't matter, at all!
>
> Keep your hitrate stable, investigate what your app is doing when it
> changes. If there's nothing for you to fix and the hitrate is dropping, db
> load is increasing, add more memcached servers. It's really really simple.
> Honestly! Looking at just one stat and making that decision is pretty
> weird.
>
> In your case, you were seeing evictions despite 50% of your memory being
> loaded with expired items. Neither of these things are a problem or even
> matter, because:
>
> - expired items are freed when they're fetched
> - evicted items are picked off of the tail of the LRU
>
> which means that *neither* the expired items or the evicted items are
> being accessed at all. You have unexpired items which are being accessed
> less frequently than stuff that's being expired!
>
> It *could* indicate a problem, but simply garbage collecting will actually
> *hide* it from you! You'll find it by analyzing your miss's and set's. You
> might then see that your app is uselessly setting hundreds of keys every
> time a user loads their profile, or frontpage, or whatever. Those keys
> then expire without ever being used again.
>
> That should lead you into a *real* benefit of not wasting time setting
> extraneous keys, or fetching keys that never exist, or finding places to
> combine data or issue multigets more correctly.
>
> With respect to your multiget note, I went over this in quite a bit of
> detail: http://dormando.livejournal.com/521163.html
>
> If you're multiget'ing related data, there's zero reason for it to hit
> more than one memcached instance. Except maybe you're fetching mass
> numbers of huge keys and it makes more sense for the TCP sessions to be
> split up in parallel. I dunno.
>
> In one final note, I'd really really appreciate it if you could stop
> hijacking threads to promote your patch. It's pretty rude, as your garbage
> collector issue has been discussed on the list several times.
>
> On Thu, 22 Jul 2010, Jakub Łopuszański wrote:
>
> > Well, I beg to differ.
> > We used to have evictions > 0, actually around 200 (per whatever munin
> counts them), so we used to think, that we have too small number of
> machines, and kept adding them.
> > After using the patch, the memory usage dropped by 80%, and we have no
> evictions since a long time, which means, that evictions where misleading,
> and happened just because LRU sometimes kills fresh items,
> > even though there are lots of outdated keys.
> >
> > Moreover it's not like RAM usage "fluctuates wildly". It's kind of
> constant, or at least periodic, so you can very accurately say if something
> bad happened, as it would be instantly visible as a deviation
> > from yesterday's charts. Before applying the patch, you could as well not
> look at the chart at all, as it was more than sure that it always shows 100%
> usage, which in my opinion gives no clue about what is
> > actually going on.
> >
> > Even if you are afraid of "wildly fluctuating" charts, you will not solve
> the problem by hiding it, and this is what actually happens if you don't
> have GC -- the traffic, the number of outdated keys, they
> > all fluctuate, but you just don't see it, if the chart always shows 100%
> usage...
> >
> > 2010/7/22 Brian Moon <[email protected]>
> >       On 7/22/10 5:46 AM, Jakub Łopuszański wrote:
> >             I see that my patch for garbage collection is still being
> ignored, and
> >             your post gives me some idea about why it is so.
> >             I think that RAM is a real problem, because currently
> (without GC) you
> >             have no clue about how much RAM you really need. So you can
> end up
> >             blindly buying more and more machines, which effectively
> means that
> >             multiget works worse and worse (client issues one big
> multiget but it
> >             gets split into many packets to many servers).
> >             Currently we try to get number of servers in the cluster
> smaller based
> >             on the reall consumption to get more from multiget feature.
> >
> >
> > I would never, never, never want my memcached daemon ram usage to
> fluctuate wildly. Eviction rate is a much better determination of how well
> your cache is being used.
> >
> > --
> >
> > Brian.
> > --------
> > http://brian.moonspot.net/
> >
> >
> >
> >
>

Re: Using PCIe SSDs instead of RAM

Reply via email to