Hi Stipe, > Am 17.10.2018 um 10:42 schrieb Stipe Tolj <st...@kannel.org > <mailto:st...@kannel.org>>: > > Am 15.10.2018 17:33, schrieb amal...@kannel.org <mailto:amal...@kannel.org>: >> Hi Stipe, >> >> sorry I was not available earlier… > > Hi Alex, > >> just checked your patch and see man many issues… >> >> 1. >> >> +static void init_smsbox_routes(Cfg *cfg, int reload) >> >> you use continue if config is wrong >> >> but you already destroyed previous configuration >> >> - /* send new config to clients */ >> + gw_rwlock_wrlock(smsbox_list_rwlock); >> + dict_destroy(smsbox_by_smsc); >> + dict_destroy(smsbox_by_receiver); >> + dict_destroy(smsbox_by_smsc_receiver); >> + smsbox_by_smsc = dict_create(30, (void(*)(void *)) octstr_destroy); >> + smsbox_by_receiver = dict_create(50, (void(*)(void *)) octstr_destroy); >> + smsbox_by_smsc_receiver = dict_create(50, (void(*)(void *)) >> octstr_destroy); >> + init_smsbox_routes(cfg, 1); >> + gw_rwlock_unlock(smsbox_list_rwlock); >> >> That means, you have inconsistency > > hmm, yes we do this intentionally actually. > > We destroy the structures, then re-parse the 'group = smsbox-route' groups > from the configuration. If a group has an error, we only dump the error but > continue to parse the following groups. > > A restart would formally break via PANIC, since the config has an issue. But > we keep on going with all groups that are configured correctly. > > I'm assuming you would rather re-parse the new config, and ONLY if it has no > error swap the new config with the old one. If it contains an error rather > keep running the old one?
Yes, keep running with old configuration and let caller know that reload failed. > >> 2. if I see reload configs without a lock… Uhh I can construct soo many >> cases, e.g. smsc is renamed in new config but still used in the old >> smsbox group, etc. etc. >> >> int bb_graceful_restart(void) >> { >> - return smsc2_graceful_restart(); >> + Cfg *cfg; >> + >> + info(0, "Reloading configuration resource `%s'.", >> octstr_get_cstr(cfg_filename)); >> + cfg = cfg_create(cfg_filename); >> + if (cfg_read(cfg) == -1) { >> + error(0, "Error processing configuration resource `%s'. Continue with >> existing configuration.", >> + octstr_get_cstr(cfg_filename)); >> + return -1; >> + } >> + >> + smsc2_graceful_restart(cfg); >> + smsbox_restart(cfg); >> + return 0; >> } > > the smsbox_restart() locks down via > > gw_rwlock_wrlock(smsbox_list_rwlock); > > and the MO receiving part do route_incoming_to_boxc(), which sets: > > gw_rwlock_rdlock(smsbox_list_rwlock); > > so we SHOULD never be in a race-condition where we would try to get a MO > routing and the structures are destroyed in the middle of the process. Right? It’s not about structure destroyed, it’s about inconsistent configuration that you have without full lock. Let’s build some scenario: 1. smsc2_graceful_restart(cfg); 2. (1) has renamed smsc-id or changed allowed/denied smsc-id 3. MO arrive but routed wrong because we still use old MO routing configuration 4. MO routing configuration gets reloaded 4.a if reload failed , whole configuration is broken/inconsistent 4.b if reload was ok, we are fine again As you see we have issues (3) and 4.a . Therefore it’s not right to see to reload config partially. Reload has to be atomic operation. > > I don't understand the argument with "smsc is renamed in new config but still > used in the old smsbox group". Could you please explain? > > Cheers, > Stipe > > -- > Best Regards, > Stipe Tolj > > ------------------------------------------------------------------- > Düsseldorf, NRW, Germany > > Kannel Foundation tolj.org <http://tolj.org/> system > architecture > http://www.kannel.org/ <http://www.kannel.org/> > http://www.tolj.org/ <http://www.tolj.org/> > > stolj at kannel.org <http://kannel.org/> st at tolj.org > <http://tolj.org/> > ------------------------------------------------------------------- >