Ivan Andika created RATIS-2186:
----------------------------------
Summary: Raft log purge preservation might purge log index that
does not exist
Key: RATIS-2186
URL: https://issues.apache.org/jira/browse/RATIS-2186
Project: Ratis
Issue Type: Bug
Reporter: Ivan Andika
Assignee: Ivan Andika
We encountered a following "Unexpected gap in segments" error when manually
synchronizing OM DB on very slow follower.
{code:java}
2024-11-07 21:49:32,940 [om4@group-13A745F1EB59-StateMachineUpdater] ERROR
org.apache.ratis.server.impl.StateMachineUpdater:
om4@group-13A745F1EB59-StateMachineUpdater caught a Throwable.
java.lang.IllegalStateException: Unexpected gap in segments:
binarySearch(88354999707) returns -1, segments=[log-88363996241_88364000257,
log-88364000258_88364004199, log-88364004200_88364008231,
log-88364008232_88364012246, log-88364012247_88364016452,
log-88364016453_88364020483, log-88364020484_88364024600,
log-88364024601_88364028704, log-88364028705_88364032801,
log-88364032802_88364036811, log-88364036812_88364040811,
log-88364040812_88364044806, log-88364044807_88364048845,
log-88364048846_88364053013, log-88364053014_88364057206,
log-88364057207_88364061416, log-88364061417_88364065583,
log-88364065584_88364069652, log-88364069653_88364073908,
log-88364073909_88364078037, log-88364078038_88364082338,
log-88364082339_88364086503, log-88364086504_88364090669,
log-88364090670_88364094827, log-88364094828_88364099047,
log-88364099048_88364103228, log-88364103229_88364107373,
log-88364107374_88364111564, log-88364111565_88364115651,
log-88364115652_88364119684, log-88364119685_88364123867,
log-88364123868_88364124644, log-88364124645_88364128703,
log-88364128704_88364132765, log-88364132766_88364136825,
log-88364136826_88364140811, log-88364140812_88364144887,
log-88364144888_88364149042, log-88364149043_88364153379,
log-88364153380_88364157732, log-88364157733_88364161937,
log-88364161938_88364166039, log-88364166040_88364170087,
log-88364170088_88364174135, log-88364174136_88364178144,
log-88364178145_88364182260, log-88364182261_88364186208,
log-88364186209_88364190136, log-88364190137_88364194445,
log-88364194446_88364198500, log-88364198501_88364202507,
log-88364202508_88364206398, log-88364206399_88364210433,
log-88364210434_88364214441, log-88364214442_88364218538,
log-88364218539_88364222548, log-88364222549_88364226618,
log-88364226619_88364230699, log-88364230700_88364234762,
log-88364234763_88364238784, log-88364238785_88364242687,
log-88364242688_88364246625, log-88364246626_88364250581,
log-88364250582_88364254520, log-88364254521_88364258544,
log-88364258545_88364262687, log-88364262688_88364266687,
log-88364266688_88364270677, log-88364270678_88364274675,
log-88364274676_88364278687, log-88364278688_88364282796,
log-88364282797_88364287134, log-88364287135_88364291229,
log-88364291230_88364295199, log-88364295200_88364299138,
log-88364299139_88364303033, log-88364303034_88364307192,
log-88364307193_88364311099, log-88364311100_88364315135,
log-88364315136_88364319072, log-88364319073_88364322884,
log-88364322885_88364326897, log-88364326898_88364330876,
log-88364330877_88364334809, log-88364334810_88364338728,
log-88364338729_88364342864, log-88364342865_88364346842,
log-88364346843_88364350811, log-88364350812_88364354727,
log-88364354728_88364358758, log-88364358759_88364359500,
log-88364359501_88364363662, log-88364363663_88364367743,
log-88364367744_88364371709, log-88364371710_88364375763,
log-88364375764_88364379715, log-88364379716_88364383734,
log-88364383735_88364387563, log-88364387564_88364391573,
log-88364391574_88364395627, log-88364395628_88364399634,
log-88364399635_88364403770, log-88364403771_88364408068,
log-88364408069_88364412129, log-88364412130_88364416145,
log-88364416146_88364420177, log-88364420178_88364424190,
log-88364424191_88364428162, log-88364428163_88364432284,
log-88364432285_88364436218, log-88364436219_88364440288,
log-88364440289_88364444352, log-88364444353_88364448196,
log-88364448197_88364452189, log-88364452190_88364456120,
log-88364456121_88364460132, log-88364460133_88364463990,
log-88364463991_88364468111, log-88364468112_88364472158,
log-88364472159_88364476323, log-88364476324_88364480303,
log-88364480304_88364484414, log-88364484415_88364488460,
log-88364488461_88364492577, log-88364492578_88364496658,
log-88364496659_88364500681, log-88364500682_88364504681,
log-88364504682_88364508692, log-88364508693_88364512735,
log-88364512736_88364516709, log-88364516710_88364520628,
log-88364520629_88364524444, log-88364524445_88364528459,
log-88364528460_88364532564, log-88364532565_88364536546,
log-88364536547_88364540655, log-88364540656_88364544713,
log-88364544714_88364548738, log-88364548739_88364552734,
log-88364552735_88364556745, log-88364556746_88364560570,
log-88364560571_88364564711, log-88364564712_88364568778,
log-88364568779_88364572855, log-88364572856_88364577025,
log-88364577026_88364580991, log-88364580992_88364585005,
log-88364585006_88364589177, log-88364589178_88364593117,
log-88364593118_88364596544, log-88364596545_88364600628,
log-88364600629_88364604666, log-88364604667_88364608788,
log-88364608789_88364612623, log-88364612624_88364616469,
log-88364616470_88364620418, log-88364620419_88364624447,
log-88364624448_88364628364, log-88364628365_88364632583,
log-88364632584_88364636690, log-88364636691_88364640840,
log-88364640841_88364645154, log-88364645155_88364649391,
log-88364649392_88364653616, log-88364653617_88364657719,
log-88364657720_88364662007, log-88364662008_88364666323,
log-88364666324_88364670449, log-88364670450_88364674849,
log-88364674850_88364679290, log-88364679291_88364683748,
log-88364683749_88364688166, log-88364688167_88364692147,
log-88364692148_88364696480, log-88364696481_88364700948,
log-88364700949_88364705067, log-88364705068_88364709420,
log-88364709421_88364713675, log-88364713676_88364718120,
log-88364718121_88364722375, log-88364722376_88364726870,
log-88364726871_88364731208, log-88364731209_88364735403,
log-88364735404_88364739660, log-88364739661_88364744079,
log-88364744080_88364748313, log-88364748314_88364752767,
log-88364752768_88364756923, log-88364756924_88364761130,
log-88364761131_88364765458, log-88364765459_88364769659,
log-88364769660_88364773864, log-88364773865_88364778029,
log-88364778030_88364782373, log-88364782374_88364786843,
log-88364786844_88364791187, log-88364791188_88364795576,
log-88364795577_88364799757, log-88364799758_88364804091,
log-88364804092_88364808438, log-88364808439_88364812735,
log-88364812736_88364817053, log-88364817054_88364821337,
log-88364821338_88364825482, log-88364825483_88364829678,
log-88364829679_88364833850, log-88364833851_88364838114,
log-88364838115_88364842299, log-88364842300_88364846583,
log-88364846584_88364849925, log-88364849926_88364854127,
log-88364854128_88364858268, log-88364858269_88364862345,
log-88364862346_88364866641, log-88364866642_88364870877,
log-88364870878_88364875147, log-88364875148_88364879433,
log-88364879434_88364883886, log-88364883887_88364888223,
log-88364888224_88364892556, log-88364892557_88364896921,
log-88364896922_88364901295, log-88364901296_88364905640,
log-88364905641_88364909861, log-88364909862_88364914097,
log-88364914098_88364918297, log-88364918298_88364922609,
log-88364922610_88364926902, log-88364926903_88364931383,
log-88364931384_88364935609, log-88364935610_88364940046,
log-88364940047_88364944407, log-88364944408_88364948542,
log-88364948543_88364952764, log-88364952765_88364956959,
log-88364956960_88364961303, log-88364961304_88364965492,
log-88364965493_88364969682, log-88364969683_88364973850,
log-88364973851_88364978007, log-88364978008_88364982280,
log-88364982281_88364986516, log-88364986517_88364990776,
log-88364990777_88364995029, log-88364995030_88364999288] {code}
When synchronizing the OM follower with the OM leader, we cleaned the OM ratis
and ratis-snapshot directories and uses rsync to sync the OM DB (that contains
the last applied index). Afterwards, we restart the slow OM follower which will
receives the AppendEntries from the leader instead of the notifyInstallSnapshot
due to the leader's purge preservation configuration. However, since the
follower does not have some of the previous log segments, the first purge will
trigger the "Unexpected gap in segments" since the purge index is earlier than
the first Raft log index in Ratis log directory.
I suspect that this might also happen for in general case for Raft server with
too low raft.server.snapshot.auto.trigger.threshold and
raft.server.log.purge.gap, and very high
raft.server.log.purge.preservation.log.num, provided
raft.server.log.purge.upto.snapshot.index is true.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)