Alexey Goncharuk created IGNITE-5772: ----------------------------------------
Summary: Race between WAL segment rollover and concurrent log Key: IGNITE-5772 URL: https://issues.apache.org/jira/browse/IGNITE-5772 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.1 Reporter: Alexey Goncharuk Assignee: Alexey Goncharuk Fix For: 2.2 The WAL log() and close() are synch-ed as follows: log: read head, check stop flag, cas head close: set stop flag, cas head to fake record. This guarantees that after close() is called, there will be no other records appended to the closed segment. Now consider three threads doing the following operations: T1: flush(); T2: rollOver(); T3: log(); The sequence of events: 1) T1 does a CAS of head to FakeRecord 2) T3 reads head as FakeRecord, reads stop flag as false 3) T2 attempts to rollOver: CAS stop to true; call flushOrWait(null); call flush(null); Since the head is an instance of FakeRecord, the flush(null) immediately returns false. This thread waits for written bytes and proceeds 4) T3 successfully does a CAS of head to non-fake record 5) T2 proceeds with rollOver, signals next available and asserts on head. The invariant above is broken when T2 does not CAS fake record during rollover, which allows T3 to append an entry to the closed segment. The solution is to change the code so the CAS is always attempted on close even if the current head is already a FakeRecord. Alternatively, we can introduce another type of fake record that will seal the WAL segment queue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)