Took another quick look...

Think there's an easy patch that might work:
https://github.com/memcached/memcached/pull/924

If you wouldn't mind helping validate? An external validator would help me
get it in time for the next release :)

Thanks,
-Dormando

On Wed, 24 Aug 2022, dormando wrote:

> Hey,
>
> Thanks for the info. Yes; this generally confirms the issue. I see some of
> your higher slab classes with "free_chunks 0", so if you're setting data
> that requires these chunks it could error out. The "stats items" confirms
> this since there are no actual items in those lower slab classes.
>
> You're certainly right a workaround of making your items < 512k would also
> work; but in general if I have features it'd be nice if they worked well
> :) Please open an issue so we can improve things!
>
> I intended to lower the slab_chunk_max default from 512k to much lower, as
> that actually raises the memory efficiency by a bit (less gap at the
> higher classes). That may help here. The system should also try ejecting
> items from the highest LRU... I need to double check that it wasn't
> already intending to do that and failing.
>
> Might also be able to adjust the page mover but not sure. The page mover
> can probably be adjusted to attempt to keep one page in reserve, but I
> think the algorithm isn't expecting slabs with no items in it so I'd have
> to audit that too.
>
> If you're up for experiments it'd be interesting to know if setting
> "-o slab_chunk_max=32768" or 16k (probably not more than 64) makes things
> better or worse.
>
> Also, crud.. it's documented as kilobytes but that's not working somehow?
> aaahahah. I guess the big EXPERIMENTAL tag scared people off since that
> never got reported.
>
> I'm guessing most people have a mix of small to large items, but you only
> have large items and a relatively low memory limit, so this is why you're
> seeing it so easily. I think most people setting large items have like
> 30G+ of memory so you end up with more spread around.
>
> Thanks,
> -Dormando
>
> On Wed, 24 Aug 2022, Hayden wrote:
>
> > What you're saying makes sense, and I'm pretty sure it won't be too hard to 
> > add some functionality to my writing code to break my large items up into
> > smaller parts that can each fit into a single chunk. That has the added 
> > benefit that I won't have to bother increasing the max item size.
> > In the meantime, though, I reran my pipeline and captured the output of 
> > stats, stats slabs, and stats items both when evicting normally and when 
> > getting
> > spammed with the error.
> >
> > First, the output when I'm in the error state:
> > **** Output of stats
> > STAT pid 1
> > STAT uptime 11727
> > STAT time 1661406229
> > STAT version b'1.6.14'
> > STAT libevent b'2.1.8-stable'
> > STAT pointer_size 64
> > STAT rusage_user 2.93837
> > STAT rusage_system 6.339015
> > STAT max_connections 1024
> > STAT curr_connections 2
> > STAT total_connections 8230
> > STAT rejected_connections 0
> > STAT connection_structures 6
> > STAT response_obj_oom 0
> > STAT response_obj_count 1
> > STAT response_obj_bytes 65536
> > STAT read_buf_count 8
> > STAT read_buf_bytes 131072
> > STAT read_buf_bytes_free 49152
> > STAT read_buf_oom 0
> > STAT reserved_fds 20
> > STAT cmd_get 0
> > STAT cmd_set 12640
> > STAT cmd_flush 0
> > STAT cmd_touch 0
> > STAT cmd_meta 0
> > STAT get_hits 0
> > STAT get_misses 0
> > STAT get_expired 0
> > STAT get_flushed 0
> > STAT delete_misses 0
> > STAT delete_hits 0
> > STAT incr_misses 0
> > STAT incr_hits 0
> > STAT decr_misses 0
> > STAT decr_hits 0
> > STAT cas_misses 0
> > STAT cas_hits 0
> > STAT cas_badval 0
> > STAT touch_hits 0
> > STAT touch_misses 0
> > STAT store_too_large 0
> > STAT store_no_memory 0
> > STAT auth_cmds 0
> > STAT auth_errors 0
> > STAT bytes_read 21755739959
> > STAT bytes_written 330909
> > STAT limit_maxbytes 5368709120
> > STAT accepting_conns 1
> > STAT listen_disabled_num 0
> > STAT time_in_listen_disabled_us 0
> > STAT threads 4
> > STAT conn_yields 0
> > STAT hash_power_level 16
> > STAT hash_bytes 524288
> > STAT hash_is_expanding False
> > STAT slab_reassign_rescues 0
> > STAT slab_reassign_chunk_rescues 0
> > STAT slab_reassign_evictions_nomem 0
> > STAT slab_reassign_inline_reclaim 0
> > STAT slab_reassign_busy_items 0
> > STAT slab_reassign_busy_deletes 0
> > STAT slab_reassign_running False
> > STAT slabs_moved 0
> > STAT lru_crawler_running 0
> > STAT lru_crawler_starts 20
> > STAT lru_maintainer_juggles 71777
> > STAT malloc_fails 0
> > STAT log_worker_dropped 0
> > STAT log_worker_written 0
> > STAT log_watcher_skipped 0
> > STAT log_watcher_sent 0
> > STAT log_watchers 0
> > STAT unexpected_napi_ids 0
> > STAT round_robin_fallback 0
> > STAT bytes 5241499325
> > STAT curr_items 4211
> > STAT total_items 12640
> > STAT slab_global_page_pool 0
> > STAT expired_unfetched 0
> > STAT evicted_unfetched 8429
> > STAT evicted_active 0
> > STAT evictions 8429
> > STAT reclaimed 0
> > STAT crawler_reclaimed 0
> > STAT crawler_items_checked 4212
> > STAT lrutail_reflocked 0
> > STAT moves_to_cold 11872
> > STAT moves_to_warm 0
> > STAT moves_within_lru 0
> > STAT direct_reclaims 55559
> > STAT lru_bumps_dropped 0
> > END
> > **** Output of stats slabs
> > STAT 2:chunk_size 120
> > STAT 2:chunks_per_page 8738
> > STAT 2:total_pages 1
> > STAT 2:total_chunks 8738
> > STAT 2:used_chunks 4211
> > STAT 2:free_chunks 4527
> > STAT 2:free_chunks_end 0
> > STAT 2:get_hits 0
> > STAT 2:cmd_set 0
> > STAT 2:delete_hits 0
> > STAT 2:incr_hits 0
> > STAT 2:decr_hits 0
> > STAT 2:cas_hits 0
> > STAT 2:cas_badval 0
> > STAT 2:touch_hits 0
> > STAT 30:chunk_size 66232
> > STAT 30:chunks_per_page 15
> > STAT 30:total_pages 1
> > STAT 30:total_chunks 15
> > STAT 30:used_chunks 3
> > STAT 30:free_chunks 12
> > STAT 30:free_chunks_end 0
> > STAT 30:get_hits 0
> > STAT 30:cmd_set 0
> > STAT 30:delete_hits 0
> > STAT 30:incr_hits 0
> > STAT 30:decr_hits 0
> > STAT 30:cas_hits 0
> > STAT 30:cas_badval 0
> > STAT 30:touch_hits 0
> > STAT 31:chunk_size 82792
> > STAT 31:chunks_per_page 12
> > STAT 31:total_pages 1
> > STAT 31:total_chunks 12
> > STAT 31:used_chunks 6
> > STAT 31:free_chunks 6
> > STAT 31:free_chunks_end 0
> > STAT 31:get_hits 0
> > STAT 31:cmd_set 0
> > STAT 31:delete_hits 0
> > STAT 31:incr_hits 0
> > STAT 31:decr_hits 0
> > STAT 31:cas_hits 0
> > STAT 31:cas_badval 0
> > STAT 31:touch_hits 0
> > STAT 32:chunk_size 103496
> > STAT 32:chunks_per_page 10
> > STAT 32:total_pages 19
> > STAT 32:total_chunks 190
> > STAT 32:used_chunks 183
> > STAT 32:free_chunks 7
> > STAT 32:free_chunks_end 0
> > STAT 32:get_hits 0
> > STAT 32:cmd_set 0
> > STAT 32:delete_hits 0
> > STAT 32:incr_hits 0
> > STAT 32:decr_hits 0
> > STAT 32:cas_hits 0
> > STAT 32:cas_badval 0
> > STAT 32:touch_hits 0
> > STAT 33:chunk_size 129376
> > STAT 33:chunks_per_page 8
> > STAT 33:total_pages 50
> > STAT 33:total_chunks 400
> > STAT 33:used_chunks 393
> > STAT 33:free_chunks 7
> > STAT 33:free_chunks_end 0
> > STAT 33:get_hits 0
> > STAT 33:cmd_set 0
> > STAT 33:delete_hits 0
> > STAT 33:incr_hits 0
> > STAT 33:decr_hits 0
> > STAT 33:cas_hits 0
> > STAT 33:cas_badval 0
> > STAT 33:touch_hits 0
> > STAT 34:chunk_size 161720
> > STAT 34:chunks_per_page 6
> > STAT 34:total_pages 41
> > STAT 34:total_chunks 246
> > STAT 34:used_chunks 245
> > STAT 34:free_chunks 1
> > STAT 34:free_chunks_end 0
> > STAT 34:get_hits 0
> > STAT 34:cmd_set 0
> > STAT 34:delete_hits 0
> > STAT 34:incr_hits 0
> > STAT 34:decr_hits 0
> > STAT 34:cas_hits 0
> > STAT 34:cas_badval 0
> > STAT 34:touch_hits 0
> > STAT 35:chunk_size 202152
> > STAT 35:chunks_per_page 5
> > STAT 35:total_pages 231
> > STAT 35:total_chunks 1155
> > STAT 35:used_chunks 1155
> > STAT 35:free_chunks 0
> > STAT 35:free_chunks_end 0
> > STAT 35:get_hits 0
> > STAT 35:cmd_set 0
> > STAT 35:delete_hits 0
> > STAT 35:incr_hits 0
> > STAT 35:decr_hits 0
> > STAT 35:cas_hits 0
> > STAT 35:cas_badval 0
> > STAT 35:touch_hits 0
> > STAT 36:chunk_size 252696
> > STAT 36:chunks_per_page 4
> > STAT 36:total_pages 536
> > STAT 36:total_chunks 2144
> > STAT 36:used_chunks 2144
> > STAT 36:free_chunks 0
> > STAT 36:free_chunks_end 0
> > STAT 36:get_hits 0
> > STAT 36:cmd_set 0
> > STAT 36:delete_hits 0
> > STAT 36:incr_hits 0
> > STAT 36:decr_hits 0
> > STAT 36:cas_hits 0
> > STAT 36:cas_badval 0
> > STAT 36:touch_hits 0
> > STAT 37:chunk_size 315872
> > STAT 37:chunks_per_page 3
> > STAT 37:total_pages 28
> > STAT 37:total_chunks 84
> > STAT 37:used_chunks 82
> > STAT 37:free_chunks 2
> > STAT 37:free_chunks_end 0
> > STAT 37:get_hits 0
> > STAT 37:cmd_set 0
> > STAT 37:delete_hits 0
> > STAT 37:incr_hits 0
> > STAT 37:decr_hits 0
> > STAT 37:cas_hits 0
> > STAT 37:cas_badval 0
> > STAT 37:touch_hits 0
> > STAT 39:chunk_size 524288
> > STAT 39:chunks_per_page 2
> > STAT 39:total_pages 4212
> > STAT 39:total_chunks 8424
> > STAT 39:used_chunks 8422
> > STAT 39:free_chunks 2
> > STAT 39:free_chunks_end 0
> > STAT 39:get_hits 0
> > STAT 39:cmd_set 12640
> > STAT 39:delete_hits 0
> > STAT 39:incr_hits 0
> > STAT 39:decr_hits 0
> > STAT 39:cas_hits 0
> > STAT 39:cas_badval 0
> > STAT 39:touch_hits 0
> > STAT active_slabs 10
> > STAT total_malloced 5368709120
> > END
> > **** Output of stats items
> > STAT items:39:number 4211
> > STAT items:39:number_hot 768
> > STAT items:39:number_warm 0
> > STAT items:39:number_cold 3443
> > STAT items:39:age_hot 28
> > STAT items:39:age_warm 0
> > STAT items:39:age 143
> > STAT items:39:mem_requested 5241499325
> > STAT items:39:evicted 8429
> > STAT items:39:evicted_nonzero 0
> > STAT items:39:evicted_time 140
> > STAT items:39:outofmemory 0
> > STAT items:39:tailrepairs 0
> > STAT items:39:reclaimed 0
> > STAT items:39:expired_unfetched 0
> > STAT items:39:evicted_unfetched 8429
> > STAT items:39:evicted_active 0
> > STAT items:39:crawler_reclaimed 0
> > STAT items:39:crawler_items_checked 4212
> > STAT items:39:lrutail_reflocked 0
> > STAT items:39:moves_to_cold 11872
> > STAT items:39:moves_to_warm 0
> > STAT items:39:moves_within_lru 0
> > STAT items:39:direct_reclaims 8429
> > STAT items:39:hits_to_hot 0
> > STAT items:39:hits_to_warm 0
> > STAT items:39:hits_to_cold 0
> > STAT items:39:hits_to_temp 0
> > END
> >
> > Then, the output when it's humming along happily again:
> > **** Output of stats
> > STAT pid 1
> > STAT uptime 11754
> > STAT time 1661406256
> > STAT version b'1.6.14'
> > STAT libevent b'2.1.8-stable'
> > STAT pointer_size 64
> > STAT rusage_user 3.056135
> > STAT rusage_system 7.074541
> > STAT max_connections 1024
> > STAT curr_connections 3
> > STAT total_connections 10150
> > STAT rejected_connections 0
> > STAT connection_structures 6
> > STAT response_obj_oom 0
> > STAT response_obj_count 1
> > STAT response_obj_bytes 65536
> > STAT read_buf_count 8
> > STAT read_buf_bytes 131072
> > STAT read_buf_bytes_free 49152
> > STAT read_buf_oom 0
> > STAT reserved_fds 20
> > STAT cmd_get 0
> > STAT cmd_set 12794
> > STAT cmd_flush 0
> > STAT cmd_touch 0
> > STAT cmd_meta 0
> > STAT get_hits 0
> > STAT get_misses 0
> > STAT get_expired 0
> > STAT get_flushed 0
> > STAT delete_misses 0
> > STAT delete_hits 0
> > STAT incr_misses 0
> > STAT incr_hits 0
> > STAT decr_misses 0
> > STAT decr_hits 0
> > STAT cas_misses 0
> > STAT cas_hits 0
> > STAT cas_badval 0
> > STAT touch_hits 0
> > STAT touch_misses 0
> > STAT store_too_large 0
> > STAT store_no_memory 0
> > STAT auth_cmds 0
> > STAT auth_errors 0
> > STAT bytes_read 24375641173
> > STAT bytes_written 415262
> > STAT limit_maxbytes 5368709120
> > STAT accepting_conns 1
> > STAT listen_disabled_num 0
> > STAT time_in_listen_disabled_us 0
> > STAT threads 4
> > STAT conn_yields 0
> > STAT hash_power_level 16
> > STAT hash_bytes 524288
> > STAT hash_is_expanding False
> > STAT slab_reassign_rescues 0
> > STAT slab_reassign_chunk_rescues 0
> > STAT slab_reassign_evictions_nomem 0
> > STAT slab_reassign_inline_reclaim 0
> > STAT slab_reassign_busy_items 0
> > STAT slab_reassign_busy_deletes 0
> > STAT slab_reassign_running False
> > STAT slabs_moved 0
> > STAT lru_crawler_running 0
> > STAT lru_crawler_starts 20
> > STAT lru_maintainer_juggles 71952
> > STAT malloc_fails 0
> > STAT log_worker_dropped 0
> > STAT log_worker_written 0
> > STAT log_watcher_skipped 0
> > STAT log_watcher_sent 0
> > STAT log_watchers 0
> > STAT unexpected_napi_ids 0
> > STAT round_robin_fallback 0
> > STAT bytes 5242957328
> > STAT curr_items 4212
> > STAT total_items 12794
> > STAT slab_global_page_pool 0
> > STAT expired_unfetched 0
> > STAT evicted_unfetched 8582
> > STAT evicted_active 0
> > STAT evictions 8582
> > STAT reclaimed 0
> > STAT crawler_reclaimed 0
> > STAT crawler_items_checked 4212
> > STAT lrutail_reflocked 0
> > STAT moves_to_cold 12533
> > STAT moves_to_warm 0
> > STAT moves_within_lru 0
> > STAT direct_reclaims 74822
> > STAT lru_bumps_dropped 0
> > END
> > **** Output of stats slabs
> > STAT 2:chunk_size 120
> > STAT 2:chunks_per_page 8738
> > STAT 2:total_pages 1
> > STAT 2:total_chunks 8738
> > STAT 2:used_chunks 4212
> > STAT 2:free_chunks 4526
> > STAT 2:free_chunks_end 0
> > STAT 2:get_hits 0
> > STAT 2:cmd_set 0
> > STAT 2:delete_hits 0
> > STAT 2:incr_hits 0
> > STAT 2:decr_hits 0
> > STAT 2:cas_hits 0
> > STAT 2:cas_badval 0
> > STAT 2:touch_hits 0
> > STAT 30:chunk_size 66232
> > STAT 30:chunks_per_page 15
> > STAT 30:total_pages 1
> > STAT 30:total_chunks 15
> > STAT 30:used_chunks 3
> > STAT 30:free_chunks 12
> > STAT 30:free_chunks_end 0
> > STAT 30:get_hits 0
> > STAT 30:cmd_set 0
> > STAT 30:delete_hits 0
> > STAT 30:incr_hits 0
> > STAT 30:decr_hits 0
> > STAT 30:cas_hits 0
> > STAT 30:cas_badval 0
> > STAT 30:touch_hits 0
> > STAT 31:chunk_size 82792
> > STAT 31:chunks_per_page 12
> > STAT 31:total_pages 1
> > STAT 31:total_chunks 12
> > STAT 31:used_chunks 6
> > STAT 31:free_chunks 6
> > STAT 31:free_chunks_end 0
> > STAT 31:get_hits 0
> > STAT 31:cmd_set 0
> > STAT 31:delete_hits 0
> > STAT 31:incr_hits 0
> > STAT 31:decr_hits 0
> > STAT 31:cas_hits 0
> > STAT 31:cas_badval 0
> > STAT 31:touch_hits 0
> > STAT 32:chunk_size 103496
> > STAT 32:chunks_per_page 10
> > STAT 32:total_pages 19
> > STAT 32:total_chunks 190
> > STAT 32:used_chunks 183
> > STAT 32:free_chunks 7
> > STAT 32:free_chunks_end 0
> > STAT 32:get_hits 0
> > STAT 32:cmd_set 0
> > STAT 32:delete_hits 0
> > STAT 32:incr_hits 0
> > STAT 32:decr_hits 0
> > STAT 32:cas_hits 0
> > STAT 32:cas_badval 0
> > STAT 32:touch_hits 0
> > STAT 33:chunk_size 129376
> > STAT 33:chunks_per_page 8
> > STAT 33:total_pages 50
> > STAT 33:total_chunks 400
> > STAT 33:used_chunks 391
> > STAT 33:free_chunks 9
> > STAT 33:free_chunks_end 0
> > STAT 33:get_hits 0
> > STAT 33:cmd_set 0
> > STAT 33:delete_hits 0
> > STAT 33:incr_hits 0
> > STAT 33:decr_hits 0
> > STAT 33:cas_hits 0
> > STAT 33:cas_badval 0
> > STAT 33:touch_hits 0
> > STAT 34:chunk_size 161720
> > STAT 34:chunks_per_page 6
> > STAT 34:total_pages 41
> > STAT 34:total_chunks 246
> > STAT 34:used_chunks 246
> > STAT 34:free_chunks 0
> > STAT 34:free_chunks_end 0
> > STAT 34:get_hits 0
> > STAT 34:cmd_set 0
> > STAT 34:delete_hits 0
> > STAT 34:incr_hits 0
> > STAT 34:decr_hits 0
> > STAT 34:cas_hits 0
> > STAT 34:cas_badval 0
> > STAT 34:touch_hits 0
> > STAT 35:chunk_size 202152
> > STAT 35:chunks_per_page 5
> > STAT 35:total_pages 231
> > STAT 35:total_chunks 1155
> > STAT 35:used_chunks 1155
> > STAT 35:free_chunks 0
> > STAT 35:free_chunks_end 0
> > STAT 35:get_hits 0
> > STAT 35:cmd_set 0
> > STAT 35:delete_hits 0
> > STAT 35:incr_hits 0
> > STAT 35:decr_hits 0
> > STAT 35:cas_hits 0
> > STAT 35:cas_badval 0
> > STAT 35:touch_hits 0
> > STAT 36:chunk_size 252696
> > STAT 36:chunks_per_page 4
> > STAT 36:total_pages 536
> > STAT 36:total_chunks 2144
> > STAT 36:used_chunks 2144
> > STAT 36:free_chunks 0
> > STAT 36:free_chunks_end 0
> > STAT 36:get_hits 0
> > STAT 36:cmd_set 0
> > STAT 36:delete_hits 0
> > STAT 36:incr_hits 0
> > STAT 36:decr_hits 0
> > STAT 36:cas_hits 0
> > STAT 36:cas_badval 0
> > STAT 36:touch_hits 0
> > STAT 37:chunk_size 315872
> > STAT 37:chunks_per_page 3
> > STAT 37:total_pages 28
> > STAT 37:total_chunks 84
> > STAT 37:used_chunks 84
> > STAT 37:free_chunks 0
> > STAT 37:free_chunks_end 0
> > STAT 37:get_hits 0
> > STAT 37:cmd_set 0
> > STAT 37:delete_hits 0
> > STAT 37:incr_hits 0
> > STAT 37:decr_hits 0
> > STAT 37:cas_hits 0
> > STAT 37:cas_badval 0
> > STAT 37:touch_hits 0
> > STAT 39:chunk_size 524288
> > STAT 39:chunks_per_page 2
> > STAT 39:total_pages 4212
> > STAT 39:total_chunks 8424
> > STAT 39:used_chunks 8424
> > STAT 39:free_chunks 0
> > STAT 39:free_chunks_end 0
> > STAT 39:get_hits 0
> > STAT 39:cmd_set 12794
> > STAT 39:delete_hits 0
> > STAT 39:incr_hits 0
> > STAT 39:decr_hits 0
> > STAT 39:cas_hits 0
> > STAT 39:cas_badval 0
> > STAT 39:touch_hits 0
> > STAT active_slabs 10
> > STAT total_malloced 5368709120
> > END
> > **** Output of stats items
> > STAT items:39:number 4212
> > STAT items:39:number_hot 261
> > STAT items:39:number_warm 0
> > STAT items:39:number_cold 3951
> > STAT items:39:age_hot 33
> > STAT items:39:age_warm 0
> > STAT items:39:age 165
> > STAT items:39:mem_requested 5242957328
> > STAT items:39:evicted 8582
> > STAT items:39:evicted_nonzero 0
> > STAT items:39:evicted_time 165
> > STAT items:39:outofmemory 0
> > STAT items:39:tailrepairs 0
> > STAT items:39:reclaimed 0
> > STAT items:39:expired_unfetched 0
> > STAT items:39:evicted_unfetched 8582
> > STAT items:39:evicted_active 0
> > STAT items:39:crawler_reclaimed 0
> > STAT items:39:crawler_items_checked 4212
> > STAT items:39:lrutail_reflocked 0
> > STAT items:39:moves_to_cold 12533
> > STAT items:39:moves_to_warm 0
> > STAT items:39:moves_within_lru 0
> > STAT items:39:direct_reclaims 8582
> > STAT items:39:hits_to_hot 0
> > STAT items:39:hits_to_warm 0
> > STAT items:39:hits_to_cold 0
> > STAT items:39:hits_to_temp 0
> > END
> >
> > I'm happy to open an issue on GitHub if the stats confirm there actually is 
> > something in the code that could be fixed. You can decide then how much
> > effort it's worth to fix it. If my workaround idea works, though, I'll just 
> > put it in place and move on to the next thing. ;-)
> > On Wednesday, August 24, 2022 at 7:01:33 PM UTC-7 Dormando wrote:
> >       To put a little more internal detail on this:
> >
> >       - As a SET is being processed item chunks must be made available
> >       - If it is chunked memory, it will be fetching these data chunks from
> >       across different slab classes (ie: 512k + 512k + sized enough for
> >       whatever's left over)
> >       - That full chunked item gets put in the largest slab class
> >       - If another SET comes along and it needs 512k + 512k + an 8k, it has 
> > to
> >       look into the 8k slab class for an item to evict.
> >       - Except there's no memory in the 8k class: it's all actually in the
> >       largest class.
> >       - So there's nothing to evict to free up memory
> >       - So you get an error.
> >       - The slab page mover can make this worse by not leaving enough 
> > reserved
> >       memory in the lower slab classes.
> >
> >       I wasn't sure how often this would happen in practice and fixed a few 
> > edge
> >       cases in the past. Though I always figured I would've revisited it 
> > years
> >       ago, so sorry about the trouble.
> >
> >       There are a few tuning options:
> >       1) more memory, lol.
> >       2) you can override slab_chunk_max to be much lower (like 8k or 16k),
> >       which will make a lot more chunks but you won't realistically notice a
> >       performance difference. This can reduce the number of total slab 
> > classes,
> >       making it easier for more "end cap" memory to be found.
> >       3) delete items as you use them so it doesn't have to evict. not the 
> > best
> >       option.
> >
> >       There're code fixes I can try but I need to see what the exact 
> > symptom is
> >       first, which is why I ask for the stats stuff.
> >
> >       On Wed, 24 Aug 2022, dormando wrote:
> >
> >       > Hey,
> >       >
> >       > You're probably hitting an edge case in the "large item support".
> >       >
> >       > Basically to store values > 512k memcached internally splits them 
> > up into
> >       > chunks. When storing items memcached first allocates the item 
> > storage,
> >       > then reads data from the client socket directly into the data 
> > storage.
> >       >
> >       > For chunked items it will be allocating chunks of memory as it 
> > reads from
> >       > the socket, which can lead to that (thankfully very specific) 
> > "during
> >       > read" error. I've long suspected some edge cases but haven't 
> > revisited
> >       > that code in ... a very long time.
> >       >
> >       > If you can grab snapshots of "stats items" and "stats slabs" when 
> > it's
> >       > both evicting normally and when it's giving you errors, I might be 
> > able to
> >       > figure out what's causing it to bottom out and see if there's some 
> > tuning
> >       > to do. Normal "stats" output is also helpful.
> >       >
> >       > It kind of smells like some slab classes are running low on memory
> >       > sometimes, and the items in them are being read for a long time... 
> > but we
> >       > have to see the data to be sure.
> >       >
> >       > If you're feeling brave you can try building the current "next" 
> > branch
> >       > from github and try it out, as some fixes to the page mover went in 
> > there.
> >       > Those fixes may have caused too much memory to be moved away from a 
> > slab
> >       > class sometimes.
> >       >
> >       > Feel free to open an issue on github to track this if you'd like.
> >       >
> >       > have fun,
> >       > -Dormando
> >       >
> >       > On Wed, 24 Aug 2022, Hayden wrote:
> >       >
> >       > > Hello,
> >       > > I'm trying to use memcached for a use case I don't think is 
> > outlandish, but it's not behaving the way I expect. I
> >       > > wanted to sanity check what I'm doing to see if it should be 
> > working but there's maybe something I've done wrong
> >       > > with my configuration, or if my idea of how it's supposed to work 
> > is wrong, or if there's a problem with
> >       > > memcached itself.
> >       > >
> >       > > I'm using memcached as a temporary shared image store in a 
> > distributed video processing application. At the front
> >       > > of the pipeline is a process (actually all these processes are 
> > pods in a kubernetes cluster, if it matters, and
> >       > > memcached is running in the cluster as well) that consumes a 
> > video stream over RTSP, saves each frame to
> >       > > memcached, and outputs events to a message bus (kafka) with 
> > metadata about each frame. At the end of the pipeline
> >       > > is another process that consumes these metadata events, and when 
> > it sees events it thinks are interesting it
> >       > > retrieves the corresponding frame from memcached and adds the 
> > frame to a web UI. The video is typically 30fps, so
> >       > > there are about 30 set() operations each second, and since each 
> > value is effectively an image the values are a
> >       > > bit big (around 1MB... I upped the maximum value size in 
> > memcached to 2MB to make sure they'd fit, and I haven't
> >       > > had any problems with my writes being rejected because of size).
> >       > >
> >       > > The video stream is processed in real-time, and effectively 
> > infinite, but the memory available to memcached
> >       > > obviously isn't (I've configured it to use 5GB, FWIW). That's OK, 
> > because the cache is only supposed to be
> >       > > temporary storage. My expectation is that once the available 
> > memory is filled up (which takes a few minutes),
> >       > > then roughly speaking for every new frame added to memcached 
> > another entry (ostensibly the oldest one) will be
> >       > > evicted. If the consuming process at the end of the pipeline 
> > doesn't get to a frame it wants before it gets
> >       > > evicted that's OK.
> >       > >
> >       > > That's not what I'm seeing, though, or at least that's not all 
> > that I'm seeing. There are lots of evictions
> >       > > happening, but the process that's writing to memcached also goes 
> > through periods where every set() operation is
> >       > > rejected with an "Out of memory during read" error. It seems to 
> > happen in bursts where for several seconds every
> >       > > write encounters the error, then for several seconds the set() 
> > calls work just fine (and presumably other keys
> >       > > are being evicted), then the cycle repeats. It goes on this way 
> > for as long as I let the process run.
> >       > >
> >       > > I'm using memcached v1.6.14, installed into my k8s cluster using 
> > the bitnami helm chart v6.0.5. My reading and
> >       > > writing applications are both using pymemcache v3.5.2 for their 
> > access.
> >       > >
> >       > > Can anyone tell me if it seems like what I'm doing should work 
> > the way I described, and where I should try
> >       > > investigating to see what's going wrong? Or alternatively, why 
> > what I'm trying to do shouldn't work the way I
> >       > > expected it to, so I can figure out how to make my applications 
> > behave differently?
> >       > >
> >       > > Thanks,
> >       > > Hayden
> >       > >
> >       > > --
> >       > >
> >       > > ---
> >       > > You received this message because you are subscribed to the 
> > Google Groups "memcached" group.
> >       > > To unsubscribe from this group and stop receiving emails from it, 
> > send an email to
> >       > > memcached+...@googlegroups.com.
> >       > > To view this discussion on the web visit
> >       > > 
> > https://groups.google.com/d/msgid/memcached/702cae66-3108-46de-bb48-38eb3e17a5b7n%40googlegroups.com.
> >       > >
> >       > >
> >       >
> >       > --
> >       >
> >       > ---
> >       > You received this message because you are subscribed to the Google 
> > Groups "memcached" group.
> >       > To unsubscribe from this group and stop receiving emails from it, 
> > send an email to memcached+...@googlegroups.com.
> >       > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/memcached/2e8a8cd4-cc13-3e78-f76-772b92374a9b%40rydia.net.
> >       >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/memcached/3c08514a-f43f-45aa-b25b-87b431cb74aen%40googlegroups.com.
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/76e3d381-1ba9-e7f2-d4b2-30d87e7cb7e%40rydia.net.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/18244f57-4cd-b086-38a3-97c6e7755030%40rydia.net.

Reply via email to