Hello, Since you started to look at it again, let me repeat myself. The problem is described in detail here: http://lists.digium.com/pipermail/asterisk-dev/2015-October/075128.html It has to do with the fact that at initial load pjsip realtime issues separate db query for each endpoint/aor/etc in the system. In my case of ~10K endpoints it took asterisk ~1.5minutes to load. Further in that discussion I suggested that having the following API call to populate sorcery cache would go a long way to reducing the scale of this problem:
ast_sorcery_retrieve_by_fields(sip_sorcery, "endpoint",AST_RETRIEVE_FLAG_MULTIPLE | AST_RETRIEVE_FLAG_ALL, NULL); I haven't looked at pjsip since the time of that discussion as that's clearly a show-stopper for me, but I doubt anything changed. Also I haven't received any feedback if that suggestion is viable, so I'd love to hear your (and/or other developers) opinion on it. Any other idea on how to deal with it is more than welcome as well. Thanks, Michael On Wednesday, March 02, 2016 06:04:15 PM Ross Beer wrote: > Hi George, > > I have commented out those lines and it hasn't improved the load times, its > still taking 15 mins. It has improved it a little. > > Regards, > > Ross > > From: [email protected] > Date: Wed, 2 Mar 2016 08:19:01 -0700 > To: [email protected] > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer <[email protected]> wrote: > > > > Hi George, > > I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and > PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins > to load. > > As requested I have opened a ticket for the realtime issue: > > https://issues.asterisk.org/jira/browse/ASTERISK-25826 > Got it, thanks. > > Basically, I think this could be resolved by a configuration option that > stops sourcery/pjsip loading all peers at start-up as this is not needed for > the current setup. This has been discussed before on the mailing list however > it doesn't look like it progresses any further. > > If you're up for trying something, you can comment out the > qualify_and_schedule_all function in lines 1135-1147 of > res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines > 1245 and 1281. If that drops your startup times, then we know we're on the > right track. > > > I would like to thank you for all of your help tying to identify the issue > and hope that we can resolve it soon. > > No worries! > > Kind regards, > > Ross > > From: [email protected] > Date: Tue, 1 Mar 2016 16:27:06 -0700 > To: [email protected] > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer <[email protected]> wrote: > > > > ok, > > That took 15 mins to load and then crashed. This will be due to the > pjsip_dlg_create_uas_and_inc_lock commit. > It should not have crashed. That commit had the fix for it. If it did > crash with that commit, open a Jira issue and attach a full backtrace. > > However 15 mins to start is a long time and would cause issues in a > production environment. > Would you open a Jira issue on the realtime problem (if one isn't already > open).I'm starting to look at alternatives. > > > > Thank you for your help here, > > Ross > > > From: [email protected] > Date: Tue, 1 Mar 2016 14:02:38 -0700 > To: [email protected] > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer <[email protected]> wrote: > > > > Hi George, > > Using a development test box for testing!! > > Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240 > > Ok, try this combination..."git checkout > c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching. > The commit I referenced is the one that handles the > pjsip_dlg_create_uas_and_inc_lock > > > > Qualify time on the aor is set to zero, I guess a query could be made to > check for a value greater than zero instead of loading all endpoints. > > Ross > > From: [email protected] > Date: Tue, 1 Mar 2016 12:45:28 -0700 > To: [email protected] > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer <[email protected]> wrote: > > > > Hi George, > > No endpoints are qualified, there are 20,000 endpoints with only 75 static > contacts defined in the aors. The database is a MySQL cluster. > > With the current Asterisk 13 branch with cache disabled and the latest PJSIP > it takes 5 mins and then before finishing it crashes. > > With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however > due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices. > > Try 13.7.2 without the cache. I'm trying to understand where the time is > being spent. I know it will crash because of that bug. You're not doing > this on a production system are you?? > The main issue here is that the endpoints are loaded as soon as PJSIP loads, > ideally endpoints would only be loaded once a device registers or attempts to > make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime. > > There is no need to load the endpoints as they are not qualified. > > How do you know they're not qualified if you don't load them? :) > Time to load up a database with 20,000 endpoints I guess. > Ross > > From: [email protected] > Date: Tue, 1 Mar 2016 11:58:15 -0700 > To: [email protected] > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy <[email protected]> > wrote: > > > Hello, > > Please see this discussion > http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html > I guess you're talking about the same problem. > It's possible. > > > > Michael > > On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote: > > Hi George, > > > > We need to store contacts in realtime for our system. However not all > > endpoints are registered only about 200, yet asterisk loops through every > > endpoint which has been defined. It does this if contacts are in realtime > > or not. > > > > Its almost like pjsip is loading them to check if they need to be qualified > > etc. > > > > Asterisk 1.8 only put things into cache once they were accessed, is this an > > option for sourcery? > Well, in order to initiate qualify of contacts, Asterisk does have to > "access" them all so I'm not quite sure what the problem is. > Can we reset to a known config and see what happens? > > pjproject from the published 2.4.5 tarball.Asterisk from the published 13.7.2 > tarball.Disable memory_cache altogether in sorcery.conf. > > See what happens. > Give me an estimate of how many endpoints and aors there are in the database, > how many of those aors have static contacts defined, and what's your qualify > interval. > An idea of your database setup would help as well. Same server, local, > remote, etc. > Let's solve 1 problem at a time. > > > > > >
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
