Ivan Andika created RATIS-2186:
----------------------------------

             Summary: Raft log purge preservation might purge log index that 
does not exist
                 Key: RATIS-2186
                 URL: https://issues.apache.org/jira/browse/RATIS-2186
             Project: Ratis
          Issue Type: Bug
            Reporter: Ivan Andika
            Assignee: Ivan Andika


We encountered a following "Unexpected gap in segments" error when manually 
synchronizing OM DB on very slow follower.
{code:java}
2024-11-07 21:49:32,940 [om4@group-13A745F1EB59-StateMachineUpdater] ERROR 
org.apache.ratis.server.impl.StateMachineUpdater: 
om4@group-13A745F1EB59-StateMachineUpdater caught a Throwable.
java.lang.IllegalStateException: Unexpected gap in segments: 
binarySearch(88354999707) returns -1, segments=[log-88363996241_88364000257, 
log-88364000258_88364004199, log-88364004200_88364008231, 
log-88364008232_88364012246, log-88364012247_88364016452, 
log-88364016453_88364020483, log-88364020484_88364024600, 
log-88364024601_88364028704, log-88364028705_88364032801, 
log-88364032802_88364036811, log-88364036812_88364040811, 
log-88364040812_88364044806, log-88364044807_88364048845, 
log-88364048846_88364053013, log-88364053014_88364057206, 
log-88364057207_88364061416, log-88364061417_88364065583, 
log-88364065584_88364069652, log-88364069653_88364073908, 
log-88364073909_88364078037, log-88364078038_88364082338, 
log-88364082339_88364086503, log-88364086504_88364090669, 
log-88364090670_88364094827, log-88364094828_88364099047, 
log-88364099048_88364103228, log-88364103229_88364107373, 
log-88364107374_88364111564, log-88364111565_88364115651, 
log-88364115652_88364119684, log-88364119685_88364123867, 
log-88364123868_88364124644, log-88364124645_88364128703, 
log-88364128704_88364132765, log-88364132766_88364136825, 
log-88364136826_88364140811, log-88364140812_88364144887, 
log-88364144888_88364149042, log-88364149043_88364153379, 
log-88364153380_88364157732, log-88364157733_88364161937, 
log-88364161938_88364166039, log-88364166040_88364170087, 
log-88364170088_88364174135, log-88364174136_88364178144, 
log-88364178145_88364182260, log-88364182261_88364186208, 
log-88364186209_88364190136, log-88364190137_88364194445, 
log-88364194446_88364198500, log-88364198501_88364202507, 
log-88364202508_88364206398, log-88364206399_88364210433, 
log-88364210434_88364214441, log-88364214442_88364218538, 
log-88364218539_88364222548, log-88364222549_88364226618, 
log-88364226619_88364230699, log-88364230700_88364234762, 
log-88364234763_88364238784, log-88364238785_88364242687, 
log-88364242688_88364246625, log-88364246626_88364250581, 
log-88364250582_88364254520, log-88364254521_88364258544, 
log-88364258545_88364262687, log-88364262688_88364266687, 
log-88364266688_88364270677, log-88364270678_88364274675, 
log-88364274676_88364278687, log-88364278688_88364282796, 
log-88364282797_88364287134, log-88364287135_88364291229, 
log-88364291230_88364295199, log-88364295200_88364299138, 
log-88364299139_88364303033, log-88364303034_88364307192, 
log-88364307193_88364311099, log-88364311100_88364315135, 
log-88364315136_88364319072, log-88364319073_88364322884, 
log-88364322885_88364326897, log-88364326898_88364330876, 
log-88364330877_88364334809, log-88364334810_88364338728, 
log-88364338729_88364342864, log-88364342865_88364346842, 
log-88364346843_88364350811, log-88364350812_88364354727, 
log-88364354728_88364358758, log-88364358759_88364359500, 
log-88364359501_88364363662, log-88364363663_88364367743, 
log-88364367744_88364371709, log-88364371710_88364375763, 
log-88364375764_88364379715, log-88364379716_88364383734, 
log-88364383735_88364387563, log-88364387564_88364391573, 
log-88364391574_88364395627, log-88364395628_88364399634, 
log-88364399635_88364403770, log-88364403771_88364408068, 
log-88364408069_88364412129, log-88364412130_88364416145, 
log-88364416146_88364420177, log-88364420178_88364424190, 
log-88364424191_88364428162, log-88364428163_88364432284, 
log-88364432285_88364436218, log-88364436219_88364440288, 
log-88364440289_88364444352, log-88364444353_88364448196, 
log-88364448197_88364452189, log-88364452190_88364456120, 
log-88364456121_88364460132, log-88364460133_88364463990, 
log-88364463991_88364468111, log-88364468112_88364472158, 
log-88364472159_88364476323, log-88364476324_88364480303, 
log-88364480304_88364484414, log-88364484415_88364488460, 
log-88364488461_88364492577, log-88364492578_88364496658, 
log-88364496659_88364500681, log-88364500682_88364504681, 
log-88364504682_88364508692, log-88364508693_88364512735, 
log-88364512736_88364516709, log-88364516710_88364520628, 
log-88364520629_88364524444, log-88364524445_88364528459, 
log-88364528460_88364532564, log-88364532565_88364536546, 
log-88364536547_88364540655, log-88364540656_88364544713, 
log-88364544714_88364548738, log-88364548739_88364552734, 
log-88364552735_88364556745, log-88364556746_88364560570, 
log-88364560571_88364564711, log-88364564712_88364568778, 
log-88364568779_88364572855, log-88364572856_88364577025, 
log-88364577026_88364580991, log-88364580992_88364585005, 
log-88364585006_88364589177, log-88364589178_88364593117, 
log-88364593118_88364596544, log-88364596545_88364600628, 
log-88364600629_88364604666, log-88364604667_88364608788, 
log-88364608789_88364612623, log-88364612624_88364616469, 
log-88364616470_88364620418, log-88364620419_88364624447, 
log-88364624448_88364628364, log-88364628365_88364632583, 
log-88364632584_88364636690, log-88364636691_88364640840, 
log-88364640841_88364645154, log-88364645155_88364649391, 
log-88364649392_88364653616, log-88364653617_88364657719, 
log-88364657720_88364662007, log-88364662008_88364666323, 
log-88364666324_88364670449, log-88364670450_88364674849, 
log-88364674850_88364679290, log-88364679291_88364683748, 
log-88364683749_88364688166, log-88364688167_88364692147, 
log-88364692148_88364696480, log-88364696481_88364700948, 
log-88364700949_88364705067, log-88364705068_88364709420, 
log-88364709421_88364713675, log-88364713676_88364718120, 
log-88364718121_88364722375, log-88364722376_88364726870, 
log-88364726871_88364731208, log-88364731209_88364735403, 
log-88364735404_88364739660, log-88364739661_88364744079, 
log-88364744080_88364748313, log-88364748314_88364752767, 
log-88364752768_88364756923, log-88364756924_88364761130, 
log-88364761131_88364765458, log-88364765459_88364769659, 
log-88364769660_88364773864, log-88364773865_88364778029, 
log-88364778030_88364782373, log-88364782374_88364786843, 
log-88364786844_88364791187, log-88364791188_88364795576, 
log-88364795577_88364799757, log-88364799758_88364804091, 
log-88364804092_88364808438, log-88364808439_88364812735, 
log-88364812736_88364817053, log-88364817054_88364821337, 
log-88364821338_88364825482, log-88364825483_88364829678, 
log-88364829679_88364833850, log-88364833851_88364838114, 
log-88364838115_88364842299, log-88364842300_88364846583, 
log-88364846584_88364849925, log-88364849926_88364854127, 
log-88364854128_88364858268, log-88364858269_88364862345, 
log-88364862346_88364866641, log-88364866642_88364870877, 
log-88364870878_88364875147, log-88364875148_88364879433, 
log-88364879434_88364883886, log-88364883887_88364888223, 
log-88364888224_88364892556, log-88364892557_88364896921, 
log-88364896922_88364901295, log-88364901296_88364905640, 
log-88364905641_88364909861, log-88364909862_88364914097, 
log-88364914098_88364918297, log-88364918298_88364922609, 
log-88364922610_88364926902, log-88364926903_88364931383, 
log-88364931384_88364935609, log-88364935610_88364940046, 
log-88364940047_88364944407, log-88364944408_88364948542, 
log-88364948543_88364952764, log-88364952765_88364956959, 
log-88364956960_88364961303, log-88364961304_88364965492, 
log-88364965493_88364969682, log-88364969683_88364973850, 
log-88364973851_88364978007, log-88364978008_88364982280, 
log-88364982281_88364986516, log-88364986517_88364990776, 
log-88364990777_88364995029, log-88364995030_88364999288] {code}
When synchronizing the OM follower with the OM leader, we cleaned the OM ratis 
and ratis-snapshot directories and uses rsync to sync the OM DB (that contains 
the last applied index). Afterwards, we restart the slow OM follower which will 
receives the AppendEntries from the leader instead of the notifyInstallSnapshot 
due to the leader's purge preservation configuration. However, since the 
follower does not have some of the previous log segments, the first purge will 
trigger the  "Unexpected gap in segments" since the purge index is earlier than 
the first Raft log index in Ratis log directory.

I suspect that this might also happen for in general case for Raft server with 
too low raft.server.snapshot.auto.trigger.threshold and 
raft.server.log.purge.gap, and very high  
raft.server.log.purge.preservation.log.num, provided 
raft.server.log.purge.upto.snapshot.index is true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to