-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3568/
-----------------------------------------------------------
Review request for Asterisk Developers.
Bugs: ASTERISK-23811
https://issues.asterisk.org/jira/browse/ASTERISK-23811
Repository: Asterisk
Description
-------
During some performance testing of Asterisk with AGI, ARI, and lots of Local
channels, we noticed that there's quite a hit in performance during channel
creation and releasing to the dialplan (ARI continue). After investigating the
performance spike that occurs during channel creation, we discovered that we
create a lot of channel snapshots that are technically unnecessary. This
includes creating snapshots during:
* AGI execution
* Returning objects for ARI commands
* During some Local channel operations
* During some dialling operations
* During variable setting
* During some bridging operations
And more.
This patch does the following:
- It removes a number of fields from channel snapshots. These fields were
rarely used, were expensive to have on the snapshot, and hurt performance. This
included formats, translation paths, Log Call ID, callgroup, pickup group, and
all channel variables. As a result, AMI Status, "core show channel", "core show
channelvar", and "pjsip show channel" were modified to either hit the live
channel or not show certain pieces of data. While this is unfortunate, the
performance gain from this patch is worth the loss in behaviour.
- It adds a mechanism to publish a cached snapshot + blob. A large number of
publications were changed to use this, including:
- During Dial begin
- During Variable assignment (if no AMI variables are emitted - if AMI
variables are set, we have to make snapshots when a variable is changed)
- During channel pickup
- When a channel is put on hold/unhold
- When a DTMF digit is begun/ended
- When creating a bridge snapshot
- When an AOC event is raised
- During Local channel optimization/Local bridging
- When endpoint snapshots are generated
- All AGI events
- All ARI responses that return a channel
- Events in the AgentPool, MeetMe, and some in Queue
- Additionally, some extraneous channel snapshots were being made that were
unnecessary. These were removed.
- The result of ast_hashtab_hash_string is now cached in stasis_cache. This
reduces a large number of calls to ast_hashtab_hash_string, which reduced the
amount of time spent in this function in gprof by around 50%.
Diffs
-----
/branches/12/res/res_pjsip/pjsip_configuration.c 414971
/branches/12/res/res_agi.c 414971
/branches/12/res/ari/resource_channels.c 414971
/branches/12/main/stasis_channels.c 414971
/branches/12/main/stasis_cache.c 414971
/branches/12/main/stasis_bridges.c 414971
/branches/12/main/pickup.c 414971
/branches/12/main/pbx.c 414971
/branches/12/main/manager.c 414971
/branches/12/main/endpoints.c 414971
/branches/12/main/dial.c 414971
/branches/12/main/core_local.c 414971
/branches/12/main/cli.c 414971
/branches/12/main/channel_internal_api.c 414971
/branches/12/main/channel.c 414971
/branches/12/main/cel.c 414971
/branches/12/main/bridge_channel.c 414971
/branches/12/main/aoc.c 414971
/branches/12/include/asterisk/stasis_channels.h 414971
/branches/12/include/asterisk/channel.h 414971
/branches/12/apps/app_queue.c 414971
/branches/12/apps/app_meetme.c 414971
/branches/12/apps/app_agent_pool.c 414971
Diff: https://reviewboard.asterisk.org/r/3568/diff/
Testing
-------
All sorts of different testing was done. For automated tests that cover these
changes, see https://bamboo.asterisk.org/bamboo/browse/ATB-TEAMSP-6. Note that
the test failures on that branch are currently also failing on the 12
branch/trunk. A few new tests were written and two Parking test's CDR results
were tweaked (but not in a fashion that violates the spec or the intent of the
records) - for those changes, see: https://reviewboard.asterisk.org/r/3578/
The first round of testing was profiling Asterisk with gprof to see if I could
reduce the amount of time spent creating channel snapshots. This was tested
using an ARI script that created a Local channel, subscribed to it, then
released that channel to the dialplan. The released Local channel would Dial 10
or so Local channels, which went back into the Stasis application. This
increased the number of Local channels rapidly on the system - letting this run
for awhile (until hilarity ensued) was a good mechanism to test the affects of
removing channel snapshots.
Once snapshot creation was no longer easily moved down the gprof list, I went
ahead and did a bit of investigation on the effects of this patch on CPU using
just the dialplan. The fact that snapshots were getting made a lot during
AGI/ARI was clearly causing problems, but since dialplan execution still has to
create a snapshot, I was curious what effect this patch would have on more
"traditional" scenarios. For each of these rounds of tests, I let Asterisk run
with SIPp for about 3000 channels. Watching top, I recorded the min/max that
showed up during that period of time. Not exactly scientific - but it gives a
general impression of the effect of the changes.
** Note that wherever I reference CPU, it should be noted that this was run on
my laptop (which is rather old) while running Firefox, Pidgin, and a whole host
of other stuff. I wasn't attempting to run a true performance analysis;
instead, I was just trying to get a relative measurement of things with and
without these changes. **
** ROUND 1 **
The first used an inbound PJSIP channel that had a variety of variables set on
it, and then got hung up:
Dialplan:
exten => test_sip,1,NoOp()
same => n,Set(var1=foo)
same => n,Set(var2=bar)
same => n,Set(var3=yackity)
same => n,Set(var4=schmackity)
same => n,Answer()
same => n,Echo()
SIPp:
sipp 127.0.0.1:5060 -sn uac -i 127.0.0.1 -p 5061 -s test_sip -d 5000
With this patch, CPU was around 9-10%; without this patch, around 10-11%. So
not much difference. This isn't too surprising: there's no dialling, there's no
interfaces being used, and very few changes being made to the channel. Dialplan
execution is the same with/without this patch - the largest 'savings' here is
the usage of cached snapshots during variable assignment.
** ROUND 2 **
This round threw in a Local channel and Dial operation. There's a slightly
larger improvement here with/without the patch:
Dialplan 2:
exten => test_dial_sip,1,NoOp()
same => n,Set(var1=foo)
same => n,Set(var2=bar)
same => n,Set(var3=yackity)
same => n,Set(var4=schmackity)
same => n,Dial(Local/term@default)
same => n,Echo()
exten => term,1,NoOp()
same => n,Set(var1=foo)
same => n,Set(var2=bar)
same => n,Set(var3=yackity)
same => n,Set(var4=schmackity)
same => n,Answer()
same => n,Echo()
With this patch, CPU was around 14-18%; without this patch, around 16-21%.
Saving a few snapshots during Local channel "bridging", dialling, and setting
variables on them shaves 2-3% off the CPU.
** ROUND 3 **
Same as round 2, only with a parallel Dial of multiple Local channels.
Improvement was actually about the same as Round 2; with the patch, around
19-23% CPU, without the patch, around 22-26% CPU.
Dialplan 3:
exten => test_dial_sip,1,NoOp()
same => n,Set(var1=foo)
same => n,Set(var2=bar)
same => n,Set(var3=yackity)
same => n,Set(var4=schmackity)
same =>
n,Dial(Local/term@default&Local/term@default&Local/term@default&Local/term@default&Local/term@default)
same => n,Echo()
exten => term,1,NoOp()
same => n,Set(var1=foo)
same => n,Set(var2=bar)
same => n,Set(var3=yackity)
same => n,Set(var4=schmackity)
same => n,Answer()
same => n,Echo()
There's a few things to draw from all of this:
(1) Snapshots aren't free. If you can use a cached version, use it. If you
can't, stage the snapshot until all attributes have been changed (which we do
quite frequently; this patch did not need to modify any of that behaviour).
(2) Snapshots should be lean. Don't put something on there that doesn't need to
be there. While there is benefit in not hitting live data to get information
(reduces the interference of inspection commands in APIs on real time
operations), it isn't worth hurting the performance of the overall system.
(3) We should pay attention to the cost of things in astobj2. Shifting the
burden from creating snapshots to cached ones means the Stasis cache - which
uses ao2 heavily - has to have good performance.
(4) It would be worthwhile to see if we can't optimize creation of snapshots
further. Often, when we create a snapshot, we only modify one or two properties
on it. Usage of memory pools, fixed sized fields with memcpy, and other
techniques _may_ improve performance further.
Thanks,
Matt Jordan
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-dev