Hi there, Not sure whether this helps but test this without set bypass_media. In my setup I have noticed the leg A session ends when bypass_media is true. Call/bridge continue successfully.
Phillip Jones On Thu, Aug 6, 2009 at 1:28 PM, Benedikt Fraunhofer<[email protected]> wrote: > Hello List, Hello *, > > First of all the usual excuses: sorry for the bad english and the long > email, no native speaker and i really tried to make it shorter, but i > guess this would result in even more "check back"s than it already > does :) > > we're currently running in a weird "lockup"-scenario in our loadtests. > > Our setup is the following: > > three freeswitch servers, let's call them A(-leg), M(aster), and > B(-leg) with the goal in mind to initiate calls on M which calls A, > play some file, bridge to B, limit call length and play (different) > prompts to A and B if they exceed that limit. > > (Note that A and B work fine, regardless of the amount of load we put on them) > A and B are silly dialplan logic, accepting calls on a certain > extension after a random delay and playing moh. Before calling > playback to a localstream they call a lua script which schedules > hangup somewhere in future (which works flawlessly) > > Calls are initiated on M using some hacked up loadgen-script issuing > http requests like > originate [sofiaSyntaxToExtensionOn_A] 6000 > . The 6000 extension on M has the following (xml) dialplan which > essentially does the following: > ------ > answer() > ...playback file... > ...set some callerid stuff > set bypass_media > bridge to extension 6009 on B > ------ > we use "execute_on_answer" on the b-leg to run a script which limits > the length of the call (doesn't matter if it's done via "export > nolocal" or "inlined" into the data part of the bridge application > "{execute_on_answer=lua ...}") > > <action application="export" data="nolocal:execute_on_answer=lua > lua/schedule-hangup.lua ${uuid}" /> > > the lua script "schedula-hangup.lua" does essentially the following: > > ------ > api = freeswitch.API(); > local res = api:execute("sched_api", "+10 none lua > lua/c2c-hangup-timeout.lua " .. argv[1]); > ------ > > the 10 seconds are just to speed up the time until it gets stuck. > > this is where things start to go wrong. if I comment out the call to > the "schedule-hangup" script, everything works fine, even if it's > under heavy load. > > c2c-hangup-timeout.lua does the following: > ------------------ > local sess = argv[1]; > if(sess) > then > freeswitch.consoleLog("INFO", "c2c-hangup-timeout.lua for uuid " > .. sess .. "\n"); > > api = freeswitch.API(); > local stillValid = api:execute("uuid_getvar", sess .. " > Dummy-DoesChannelExists"); > if(stillValid:sub(1,4) == "-ERR") > then > log("session uuid " .. sess .. " disappeared (nothing bad)"); > else > -- this is important!!! Otherwise the aleg get's just hung up! > api:execute("uuid_media", sess); > api:execute("uuid_transfer", sess .. " -both timeout"); > end > else -- /if(sess) > log("called with nil session?"); > end -- /if(sess) > > ------------------ > > i guess this needs some explanation: > we get the uuid of the channel as argument in argv[1]. We don't use > local session = freeswitch.Session(uuid); > since if the channel referenced by "uuid" does not exist any longer, > freeswitch (or the lua bindings) try to interpret the uuid as an > "originate string" and can't figure out how to call that. So we use a > dummy api call to get some channel variable. If the channel does not > exist any longer (A or B already hung up), we get an error message > starting with "-ERR", otherwise the channel still exists (we get > "_unset_" as the value, if it's not set) and we continue by getting > freeswitch back in the media path (uuid_media) and then transferring > both legs to an extension called "timeout" which plays some prompt and > finally calls hangup(). > > If we don't do the uuid_media call, one of the legs gets hung up when > we transfer them to the extension. This looks like the following on > the console after issuing "uuid_transfer [uuid] -both timeout" > (extensions are not the same as in our loadgen example above) > > > -------------- > 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1334 Hangup (*) > sofia/internal/1000 [CS_HIBERNATE] [BLIND_TRANSFER] > 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer > sofia/internal/[email protected]:5060 to xml[time...@default] > 2009-07-23 19:57:19.865703 [INFO] mod_dialplan_xml.c:310 Processing > BFR1004->timeout in context default > API CALL [uuid_transfer(73812082-77b1-11de-b9f8-a10bb0eb9f69 -both > timeout)] output: > +OK > > 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer (**) > sofia/internal/1000 to xml[time...@default] > 2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1084 Session > 60 (sofia/internal/1000) Ended > 2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1086 Close > Channel sofia/internal/1000 [CS_DESTROY] > ----------- > > note that it first does Hangup (denoted by *, no that's not an > asterisk :) on extension 1000 and then tries to Transfer (**) the hung > up channel to the dial plan. this could be the same as in an earlier > post to the list "SIP re-invite / bypass_media // Phillip Jones // > Wed, 01 Jul 2009 13:30:53 -0700)" > > This is why we do not directly call sched_transfer() but call a script > in between to do the uuid_media() call. I couldn't figure out how to > call that directly from the xml dialplan and/or how to check if the > channel still exists. > > so... after using uuid_media(), both legs are transferred without an > (intermediate|bogus) hangup() call. > > This only works fine if we've few concurrent calls. There is no magic > borderline where it starts to refuse work. > > Some of the Symptoms are: traffic decreased to zero as no new channels > are successfully brought up, some of the signaling traffic is not > ACKed or OKed, scheduled jobs are not run. > > if i read the output of "show channels" correctly, they're all stuck > in different applications like hangup(), some are calling lua but most > of them are in signaling_bridge(). Freeswitch is still responding on > the console and there's almost no load on the machine (no busy polling > or some other kind of running amok). > > if i kill one of them using uuid_kill() or kill all of them > using"fsctl hupall" i get "Task was executed late by 866 seconds 12379 > sched_api_function (none)" messages and the usual cleanup takes place. > As a quick hack i tried to schedule a uuid_kill() call 20 seconds > after the scheduling call to the lua script but that job is not > executed either. > > So what am I doing wrong? Is it some deadlock where uuid_media() and > uuid_transfer() are waiting for the other to finish? > Or some other silly simple thing i missed? > > Thx in advance > > Benedikt. > > _______________________________________________ > FreeSWITCH-users mailing list > [email protected] > http://lists.freeswitch.org/mailman/listinfo/freeswitch-users > UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users > http://www.freeswitch.org > _______________________________________________ FreeSWITCH-users mailing list [email protected] http://lists.freeswitch.org/mailman/listinfo/freeswitch-users UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users http://www.freeswitch.org
