bryancall commented on issue #12583:
URL: 
https://github.com/apache/trafficserver/issues/12583#issuecomment-4012511144

   Thanks for the detailed write-up. We've been investigating this and can 
confirm the code path is buggy -- `scanRemoveDone()` ignores the 
`CACHE_EVENT_REMOVE_FAILED` event and loops back to `scanObject()`, which will 
find the same undeleted object and loop forever.
   
   However, we're having trouble reproducing the stripe mismatch that triggers 
it. A few questions:
   
   1. **Has the `volume.config` ever changed on this system?** For example, was 
the cache originally created with a different number of volumes or different 
size percentages, and later changed to the current 9-volume config? The cache 
data on `/dev/sdb` would persist across restarts, but the stripe hash table 
would be rebuilt from the new config, which could cause `key_to_stripe()` to 
map keys to different stripes than where the data was originally written.
   
   2. **Was `/dev/sdb` ever reformatted or had its cache cleared** 
(`traffic_server -Cclear`) between the time the objects were cached and when 
you ran the scan? Or has it been running continuously with the same cache data?
   
   3. **Is this reproducible on demand**, or did it happen once? If 
reproducible, could you share the exact steps -- specifically whether ATS was 
restarted between populating the cache and running the scan?
   
   We're asking because with a stable config (same `volume.config` across 
restarts), `key_to_stripe()` should return the same stripe the object was 
originally written to. We'd like to understand what caused the mismatch in your 
environment so we can write a proper regression test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to