Hi there,

Not sure whether this helps but test this without set bypass_media. In
my setup I have noticed the leg A session ends when bypass_media is
true. Call/bridge continue successfully.

Phillip Jones

On Thu, Aug 6, 2009 at 1:28 PM, Benedikt
Fraunhofer<[email protected]> wrote:
> Hello List, Hello *,
>
> First of all the usual excuses: sorry for the bad english and the long
> email, no native speaker and i really tried to make it shorter, but i
> guess this would result in even more "check back"s than it already
> does :)
>
> we're currently running in a weird "lockup"-scenario in our loadtests.
>
> Our setup is the following:
>
> three freeswitch servers, let's call them A(-leg), M(aster), and
> B(-leg) with the goal in mind to initiate calls on M which calls A,
> play some file, bridge to B, limit call length and play (different)
> prompts to A and B if they exceed that limit.
>
> (Note that A and B work fine, regardless of the amount of load we put on them)
> A and B are silly dialplan logic, accepting calls on a certain
> extension after a random delay and playing moh. Before calling
> playback to a localstream they call a lua script which schedules
> hangup somewhere in future (which works flawlessly)
>
> Calls are initiated on M using some hacked up loadgen-script issuing
> http requests like
>   originate [sofiaSyntaxToExtensionOn_A] 6000
> . The 6000 extension on M has the following (xml) dialplan which
> essentially does the following:
> ------
> answer()
> ...playback file...
> ...set some callerid stuff
> set bypass_media
> bridge to extension 6009 on B
> ------
> we use "execute_on_answer" on the b-leg to run a script which limits
> the length of the call (doesn't matter if it's done via "export
> nolocal" or "inlined" into the data part of the bridge application
> "{execute_on_answer=lua ...}")
>
> <action application="export" data="nolocal:execute_on_answer=lua
> lua/schedule-hangup.lua ${uuid}" />
>
> the lua script "schedula-hangup.lua" does essentially the following:
>
> ------
> api = freeswitch.API();
> local res = api:execute("sched_api", "+10 none lua
> lua/c2c-hangup-timeout.lua " .. argv[1]);
> ------
>
> the 10 seconds are just to speed up the time until it gets stuck.
>
> this is where things start to go wrong. if I comment out the call to
> the "schedule-hangup" script, everything works fine, even if it's
> under heavy load.
>
> c2c-hangup-timeout.lua does the following:
> ------------------
> local sess = argv[1];
> if(sess)
> then
>    freeswitch.consoleLog("INFO", "c2c-hangup-timeout.lua for uuid "
> .. sess .. "\n");
>
>    api = freeswitch.API();
>    local stillValid = api:execute("uuid_getvar", sess .. "
> Dummy-DoesChannelExists");
>    if(stillValid:sub(1,4) == "-ERR")
>    then
>        log("session uuid " .. sess .. " disappeared (nothing bad)");
>    else
>        -- this is important!!! Otherwise the aleg get's just hung up!
>        api:execute("uuid_media", sess);
>        api:execute("uuid_transfer", sess .. " -both timeout");
>    end
> else -- /if(sess)
>    log("called with nil session?");
> end -- /if(sess)
>
> ------------------
>
> i guess this needs some explanation:
> we get the uuid of the channel as argument in argv[1]. We don't use
>   local session = freeswitch.Session(uuid);
> since if the channel referenced by "uuid" does not exist any longer,
> freeswitch (or the lua bindings) try to interpret the uuid as an
> "originate string" and can't figure out how to call that. So we use a
> dummy api call to get some channel variable. If the channel does not
> exist any longer (A or B already hung up), we get an error message
> starting with "-ERR", otherwise the channel still exists (we get
> "_unset_" as the value, if it's not set) and we continue by getting
> freeswitch back in the media path (uuid_media) and then transferring
> both legs to an extension called "timeout" which plays some prompt and
> finally calls hangup().
>
> If we don't do the uuid_media call, one of the legs gets hung up when
> we transfer them to the extension. This looks like the following on
> the console after issuing "uuid_transfer [uuid] -both timeout"
> (extensions are not the same as in our loadgen example above)
>
>
> --------------
> 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1334 Hangup (*)
> sofia/internal/1000 [CS_HIBERNATE] [BLIND_TRANSFER]
> 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer
> sofia/internal/[email protected]:5060 to xml[time...@default]
> 2009-07-23 19:57:19.865703 [INFO] mod_dialplan_xml.c:310 Processing
> BFR1004->timeout in context default
> API CALL [uuid_transfer(73812082-77b1-11de-b9f8-a10bb0eb9f69 -both
> timeout)] output:
> +OK
>
> 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer (**)
> sofia/internal/1000 to xml[time...@default]
> 2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1084 Session
> 60 (sofia/internal/1000) Ended
> 2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1086 Close
> Channel sofia/internal/1000 [CS_DESTROY]
> -----------
>
> note that it first does Hangup (denoted by *, no that's not an
> asterisk :) on extension 1000 and then tries to Transfer (**) the hung
> up channel to the dial plan. this could be the same as in an earlier
> post to the list "SIP re-invite / bypass_media // Phillip Jones //
> Wed, 01 Jul 2009 13:30:53 -0700)"
>
> This is why we do not directly call sched_transfer() but call a script
> in between to do the uuid_media() call. I couldn't figure out how to
> call that directly from the xml dialplan and/or how to check if the
> channel still exists.
>
> so... after using uuid_media(), both legs are transferred without an
> (intermediate|bogus) hangup() call.
>
> This only works fine if we've few concurrent calls. There is no magic
> borderline where it starts to refuse work.
>
> Some of the Symptoms are: traffic decreased to zero as no new channels
> are successfully brought up, some of the signaling traffic is not
> ACKed or OKed, scheduled jobs are not run.
>
> if i read the output of "show channels" correctly, they're all stuck
> in different applications like hangup(), some are calling lua but most
> of them are in signaling_bridge(). Freeswitch is still responding on
> the console and there's almost no load on the machine (no busy polling
> or some other kind of running amok).
>
> if i kill one of them using uuid_kill() or kill all of them
> using"fsctl hupall" i get "Task was executed late by 866 seconds 12379
> sched_api_function (none)" messages and the usual cleanup takes place.
> As a quick hack i tried to schedule a uuid_kill() call 20 seconds
> after the scheduling call to the lua script but that job is not
> executed either.
>
> So what am I doing wrong? Is it some deadlock where uuid_media() and
> uuid_transfer()  are waiting for the other to finish?
> Or some other silly simple thing i missed?
>
> Thx in advance
>
>  Benedikt.
>
> _______________________________________________
> FreeSWITCH-users mailing list
> [email protected]
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
>

_______________________________________________
FreeSWITCH-users mailing list
[email protected]
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org

Reply via email to