Re: "Out of memory during read" errors instead of key eviction

dormando Wed, 24 Aug 2022 19:01:34 -0700

To put a little more internal detail on this:

- As a SET is being processed item chunks must be made available
- If it is chunked memory, it will be fetching these data chunks from
across different slab classes (ie: 512k + 512k + sized enough for
whatever's left over)
- That full chunked item gets put in the largest slab class
- If another SET comes along and it needs 512k + 512k + an 8k, it has to
look into the 8k slab class for an item to evict.
- Except there's no memory in the 8k class: it's all actually in the
largest class.
- So there's nothing to evict to free up memory
- So you get an error.
- The slab page mover can make this worse by not leaving enough reserved
memory in the lower slab classes.


I wasn't sure how often this would happen in practice and fixed a few edge
cases in the past. Though I always figured I would've revisited it years
ago, so sorry about the trouble.

There are a few tuning options:
1) more memory, lol.
2) you can override slab_chunk_max to be much lower (like 8k or 16k),
which will make a lot more chunks but you won't realistically notice a
performance difference. This can reduce the number of total slab classes,
making it easier for more "end cap" memory to be found.
3) delete items as you use them so it doesn't have to evict. not the best
option.

There're code fixes I can try but I need to see what the exact symptom is
first, which is why I ask for the stats stuff.

On Wed, 24 Aug 2022, dormando wrote:

> Hey,
>
> You're probably hitting an edge case in the "large item support".
>
> Basically to store values > 512k memcached internally splits them up into
> chunks. When storing items memcached first allocates the item storage,
> then reads data from the client socket directly into the data storage.
>
> For chunked items it will be allocating chunks of memory as it reads from
> the socket, which can lead to that (thankfully very specific) "during
> read" error. I've long suspected some edge cases but haven't revisited
> that code in ... a very long time.
>
> If you can grab snapshots of "stats items" and "stats slabs" when it's
> both evicting normally and when it's giving you errors, I might be able to
> figure out what's causing it to bottom out and see if there's some tuning
> to do. Normal "stats" output is also helpful.
>
> It kind of smells like some slab classes are running low on memory
> sometimes, and the items in them are being read for a long time... but we
> have to see the data to be sure.
>
> If you're feeling brave you can try building the current "next" branch
> from github and try it out, as some fixes to the page mover went in there.
> Those fixes may have caused too much memory to be moved away from a slab
> class sometimes.
>
> Feel free to open an issue on github to track this if you'd like.
>
> have fun,
> -Dormando
>
> On Wed, 24 Aug 2022, Hayden wrote:
>
> > Hello,
> > I'm trying to use memcached for a use case I don't think is outlandish, but 
> > it's not behaving the way I expect. I
> > wanted to sanity check what I'm doing to see if it should be working but 
> > there's maybe something I've done wrong
> > with my configuration, or if my idea of how it's supposed to work is wrong, 
> > or if there's a problem with
> > memcached itself.
> >
> > I'm using memcached as a temporary shared image store in a distributed 
> > video processing application. At the front
> > of the pipeline is a process (actually all these processes are pods in a 
> > kubernetes cluster, if it matters, and
> > memcached is running in the cluster as well) that consumes a video stream 
> > over RTSP, saves each frame to
> > memcached, and outputs events to a message bus (kafka) with metadata about 
> > each frame. At the end of the pipeline
> > is another process that consumes these metadata events, and when it sees 
> > events it thinks are interesting it
> > retrieves the corresponding frame from memcached and adds the frame to a 
> > web UI. The video is typically 30fps, so
> > there are about 30 set() operations each second, and since each value is 
> > effectively an image the values are a
> > bit big (around 1MB... I upped the maximum value size in memcached to 2MB 
> > to make sure they'd fit, and I haven't
> > had any problems with my writes being rejected because of size).
> >
> > The video stream is processed in real-time, and effectively infinite, but 
> > the memory available to memcached
> > obviously isn't (I've configured it to use 5GB, FWIW). That's OK, because 
> > the cache is only supposed to be
> > temporary storage. My expectation is that once the available memory is 
> > filled up (which takes a few minutes),
> > then roughly speaking for every new frame added to memcached another entry 
> > (ostensibly the oldest one) will be
> > evicted. If the consuming process at the end of the pipeline doesn't get to 
> > a frame it wants before it gets
> > evicted that's OK.
> >
> > That's not what I'm seeing, though, or at least that's not all that I'm 
> > seeing. There are lots of evictions
> > happening, but the process that's writing to memcached also goes through 
> > periods where every set() operation is
> > rejected with an "Out of memory during read" error. It seems to happen in 
> > bursts where for several seconds every
> > write encounters the error, then for several seconds the set() calls work 
> > just fine (and presumably other keys
> > are being evicted), then the cycle repeats. It goes on this way for as long 
> > as I let the process run.
> >
> > I'm using memcached v1.6.14, installed into my k8s cluster using the 
> > bitnami helm chart v6.0.5. My reading and
> > writing applications are both using pymemcache v3.5.2 for their access.
> >
> > Can anyone tell me if it seems like what I'm doing should work the way I 
> > described, and where I should try
> > investigating to see what's going wrong? Or alternatively, why what I'm 
> > trying to do shouldn't work the way I
> > expected it to, so I can figure out how to make my applications behave 
> > differently?
> >
> > Thanks,
> > Hayden
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to
> > [email protected].
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/memcached/702cae66-3108-46de-bb48-38eb3e17a5b7n%40googlegroups.com.
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/2e8a8cd4-cc13-3e78-f76-772b92374a9b%40rydia.net.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/cf302d7-385b-91c4-2f5d-d76dd6ce734d%40rydia.net.

Re: "Out of memory during read" errors instead of key eviction

Reply via email to