Actually I have to apologize. Initial load of endpoints does populate the cache. Not sure what I was looking at, but I was just checking it now multiple times and it works. I'm sorry for the misinformation. The rest of the problem stands though. Thanks,
Michael On Saturday, October 17, 2015 01:54:44 PM Michael Ulitskiy wrote: > Matthew, > > First of all, I apologize if my tone sounded too harsh. I didn't mean to > offend anyone. > I didn't mean to just say "it sucks". I wish to point it out though that, > again, unless I'm missing something, > current behaviour of pjsip realtime is not scalable and I believe it's a > departure from > what has been known as "dynamic realtime" for a long time. > Please see inline for the answers to particular questions. > Thanks, > Michael > > On Saturday, October 17, 2015 10:49:32 AM Matthew Jordan wrote: > > On Sat, Oct 17, 2015 at 10:03 AM, Michael Ulitskiy <[email protected]> > > wrote: > > > Matthew, > > > > > > > > > > > > Thanks for the reply. > > > > > > Yes I do have caching enabled. While caching does somewhat help (there are > > > different problems there) > > > > Which problems? > > The problem here isn't actually related to caching implementation, but to the > way pjsip matches endpoints. > Whenever sip request arrives pjsip initially performs lookup for > 'username@domain' and if it fails it falls > back to lookup by username only. > > It results in 2 queries: > SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1@domain'; > SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1'; > > Now in my environment only the 2nd one will succeed and will be cached. Now > for every sip request my asterisk > will be issuing > > SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1@domain'; > > that will never succeed followed by retrieving 'ep1' from cache. > Basically I'd like to have a way to suppress lookup for 'username@domain' or > at least to cache the negative results. > > > > > > with ongoing load it has nothing to do with initial load that is still > > > done > > > in the extremely inefficient way > > > > > > I described in my original email. > > > > I'm not sure why that would be the case. You'll need to be more > > specific, and provide your sorcery.conf configuration as well as the > > specific operations/times when there are issues. > > sorcery.conf: > [res_pjsip] > endpoint=config,pjsip.conf,criteria=type=endpoint > endpoint/cache=memory_cache,expire_on_reload=yes,object_lifetime_maximum=600,object_lifetime_stale=300 > endpoint=realtime,ps_endpoints > aor=config,pjsip.conf,criteria=type=aor > aor/cache=memory_cache,expire_on_reload=yes,object_lifetime_maximum=600,object_lifetime_stale=300 > aor=realtime,ps_aors > > extconfig.conf: > ps_endpoints => pgsql,users,pjsip_endpoints_v > ps_aors => pgsql,users,pjsip_aors_v > > When asterisk starts up and loads pjsip it does the following: > SELECT * FROM pjsip_aors_v WHERE id LIKE '%' ORDER BY id > SELECT * FROM pjsip_endpoints_v WHERE id LIKE '%' ORDER BY id > thus loading all endpoints and AORs in memory. Then the worst part, it > follows on with loading all > endpoints and AORs individually with queries like this: > SELECT * FROM pjsip_aors_v WHERE id = 'ep1' > SELECT * FROM pjsip_aors_v WHERE id = 'ep2' > ... > SELECT * FROM pjsip_aors_v WHERE id = 'epN' > > then > > SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1' > SELECT * FROM pjsip_endpoints_v WHERE id = 'ep2' > ... > SELECT * FROM pjsip_endpoints_v WHERE id = 'epN' > > With 10K endpoints it results in 20K queries to db at asterisk startup. Now > imagine multiple asterisk > servers. This is the biggest problem. > > Also, to my surprise, this initial loading doesn't populate cache. > Right after asterisk startup I do "sorcery memory cache dump > res_pjsip/endpoint" and it's empty therefore causing > additional db lookups as asterisk starts to serve sip requests. > > > > Caching also doesn't help at all with CLI commands like "pjsip show > > > endpoints" in which case asterisk > > > > > > reloads the whole list from db instead of showing what it has in-memory. > > > > That actually is by design. > > > > Say we are caching endpoints. The cache only contains the n most > > recently requested endpoints, *not* every endpoint that you may have > > in your system. Hence, if you ask for all endpoints, we have to bypass > > the cache and get all endpoints in order to accurately fulfill the > > request. > > > > Given that this is a human interaction and not a run-time machine > > interaction, the fact that you're requesting all endpoints results in > > going out to the database is not unreasonable. > > Well I see your point. The thing is that in a system where endpoints are > dynamically spread over multiple asterisk systems I never want to see > all the endpoints. Only those that's been served by this asterisk and cached. > May be it's worth having a command that shows only cached endpoints? > Basically I was happy with how chan_sip worked in that regard - only loading > endpoints on-demand and only showing those endpoints that are loaded in > memory. > > > > Also I've noticed another very awkward problem. If I type "pjsip show > > > endpoint" in the console and then > > > > > > press "Tab" then asterisk hangs for over a minute and I register over 300 > > > queries like this in the db log: > > > > So, first, you are asking for name completion against 10k endpoints. > > Regardless of the number of database queries, that's a large set to > > complete against. Granted, there's no reason to go get the dataset on > > every single entry... > > > > > > > > > > > SELECT * FROM pjsip_endpoints_v WHERE id LIKE '%' ORDER BY id > > > > > > > ... which does appear as if that is what we are doing. In pjsip_cli: > > > > while ((object = ao2_t_iterator_next(&i, "iterate thru endpoints > > table"))) { > > const char *id = formatter_entry->get_id(object); > > if (!strncasecmp(word, id, wordlen) > > && ++which > state) { > > result = ast_strdup(id); > > } > > ao2_t_ref(object, -1, "toss iterator endpoint ptr before break"); > > if (result) { > > break; > > } > > } > > > > Since the endpoint formatter_entry only has a 'get by id' callback: > > > > static void *cli_endpoint_retrieve_by_id(const char *id) > > { > > return ast_sorcery_retrieve_by_id(ast_sip_get_sorcery(), "endpoint", > > id); > > } > > > > That means that for every partial match that you have on an endpoint, > > we do a separate lookup. > > > > Alternatively, we could go pull a partial match in a single query, > > than iterate over the returned set of matches. Clearly that would be a > > lot better in this case. > > > > > > > > Why would asterisk need to load the whole list of endpoints more than 300 > > > times is just completely beyond me. > > > > > > > Hyperbole aside, it's because PJSIP chose a sane, maintainable method > > to interact with its storage backends and uses a data abstraction > > layer above its SQL statements - unlike chan_sip, which just embeds > > the statements willy-nilly in the codebase. The downside of this is > > that sometimes - in some specific cases - we aren't as efficient as we > > should be. > > > > That's fixable however. Please do file a specific issue for the tab > > completion case, as that should be improved. > > First of all, again, I'd prefer that completion to be performed not against > all the endpoints > in db, but only those loaded and cached. > Second, my test environment doesn't have 10K of endpoints, but only currently > 173. > I imagine that if I did it against all 10K endpoints it would never finish. > Sure I'll open an issue for that. > > > > > > > For a long time it was my understanding that "dynamic realtime" means > > > loading data from db on demand. > > > > > > What pjsip does now is not a dynamic realtime. What it does seems like the > > > mix of both worlds: static realtime in the beginning - > > > > > > loading everything from db and dynamic afterwords - issuing queries > > > whenever > > > it needs endpoint data (caching helps here). > > > > > > > > > > > > Unless I'm missing something and there's a another/better way to use it, I > > > think pjsip realtime is not usable now > > > > > > at any scale other than very trivial one. > > > > > > > Please leave hyperbole at the door. If you'd like help narrowing down > > the specific cases that are causing issues, that'd be great. We'd love > > to help. "I think this sucks" isn't helpful. > > > > Right now, you've pointed out one specific case that clearly needs > > improvement. Please provide specific evidence for each case that > > you're running into, when caching is enabled, where a run-time > > operation is substantially less efficient than it should be. > > > > And remember: this is an open source project. If you'd like to help > > fix things, that's always appreciated. > > > > Matt > > > >
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
