bryancall commented on issue #12583: URL: https://github.com/apache/trafficserver/issues/12583#issuecomment-4012511144
Thanks for the detailed write-up. We've been investigating this and can confirm the code path is buggy -- `scanRemoveDone()` ignores the `CACHE_EVENT_REMOVE_FAILED` event and loops back to `scanObject()`, which will find the same undeleted object and loop forever. However, we're having trouble reproducing the stripe mismatch that triggers it. A few questions: 1. **Has the `volume.config` ever changed on this system?** For example, was the cache originally created with a different number of volumes or different size percentages, and later changed to the current 9-volume config? The cache data on `/dev/sdb` would persist across restarts, but the stripe hash table would be rebuilt from the new config, which could cause `key_to_stripe()` to map keys to different stripes than where the data was originally written. 2. **Was `/dev/sdb` ever reformatted or had its cache cleared** (`traffic_server -Cclear`) between the time the objects were cached and when you ran the scan? Or has it been running continuously with the same cache data? 3. **Is this reproducible on demand**, or did it happen once? If reproducible, could you share the exact steps -- specifically whether ATS was restarted between populating the cache and running the scan? We're asking because with a stable config (same `volume.config` across restarts), `key_to_stripe()` should return the same stripe the object was originally written to. We'd like to understand what caused the mismatch in your environment so we can write a proper regression test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
