bneradt opened a new pull request, #12873: URL: https://github.com/apache/trafficserver/pull/12873
CacheSync::mainEvent is a multi-step state machine: syncing a single stripe's directory requires multiple AIO writes (header, body chunks, footer), each returning EVENT_CONT and re-entering mainEvent on completion. The refactor in #12639 moved current_index++ to the Lrestart label, which is reached on every entry to mainEvent. This caused the stripe index to advance on each AIO callback, even while in the middle of writing a stripe's directory. The result was writing one stripe's directory data to another stripe's disk location, corrupting the on-disk directory. After running with this bug, ATS could not initialize the cache on restart because the recovery code would find heavily corrupted directory state. Move current_index++ to Ldone where it is only reached after a stripe's full sync completes, matching the original behavior of ++stripe_index in the pre-#12639 code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
