bneradt opened a new pull request, #12873:
URL: https://github.com/apache/trafficserver/pull/12873

   CacheSync::mainEvent is a multi-step state machine: syncing a single 
stripe's directory requires multiple AIO writes (header, body chunks, footer), 
each returning EVENT_CONT and re-entering mainEvent on completion. The refactor 
in #12639 moved current_index++ to the Lrestart label, which is reached on 
every entry to mainEvent. This caused the stripe index to advance on each AIO 
callback, even while in the middle of writing a stripe's directory. The result 
was writing one stripe's directory data to another stripe's disk location, 
corrupting the on-disk directory. After running with this bug, ATS could not 
initialize the cache on restart because the recovery code would find heavily 
corrupted directory state.
   
   Move current_index++ to Ldone where it is only reached after a stripe's full 
sync completes, matching the original behavior of ++stripe_index in the 
pre-#12639 code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to