Marcus Pereira wrote:
>>>   At a random time  (may be hours, days or weeks) the main couriertcpd
>>> keeps running and accepting connections (until the max clients are
>>> reached) but the childs processess never ends.
>>> [...]
>>> 2) strace for a child couriertcpd process while on start of the lock
>>> [...]
>>> 17:46:43.758570 read(4, ""..., 2446) = 0
>>> 17:46:43.758654 read(4, ""..., 2446) = 0
>>> 17:46:43.758762 read(4,
>>> "\1\0\0\0\0\0\0\0\216\t\0\0\0\0\0\0r6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
>>> 2446) = 2446
>>> 17:46:43.759082 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(25),
>>
>> If that is called before getsockname, it means it is in bdbobj_open,
>> right? Are processes starving because of some locking mechanism?
> 
> I think at this point the db (smtpaccess.dat) is already open, the hang is 
> when the process makes queries.
> 
> As I could trace:
>   .  tcpd/tcpd.c:
>        function "accepted" calls "allowaccess"

I saw "sox_getsockname" is called before "allowaccess", thence my 
guess that it was in the former call. However, yes, there is a further 
call to "sox_getsockname" in "run", after "allowaccess" in the child.

>        function "allowaccess" calls "doallowaccess"
>        function  "doallowaccess" calls "chkaccess"
>   . tcpd/tcpdaccess.c:
>       function "chkacess" calls "dbobj_fetch"

Mind that you have twice #define dbobj_fetch in dbobj.h

>   . bdbobj/bdbobj.c:
>       function "dbobj_fetch" calls "doquery"
>       ** The process get locked on a infinity loop at "doquery" function ( 
> for (;;) )
>       function "doquery"  calls "dofetch"
>       function "dofetch" calls "(*obj->dbf->get)"
>    From here I could not trace anymore, but I guess it is a call for the 
> gdbm library.

The _db-4_ library, actually. dbf->get gets mapped to an interface 
function in the call to db_create (e.g. "__db_get_pp").

However, your further posts imply you are using gdbm, thus you should 
check gdbmobj/gdbmobj.c: function "gdbmobj_fetch". Or have you been 
switching from bdb to gdbm during the weekend?

>    The fetch is never returning successfully. So the function get locked on 
> the loop.

I don't understand those repeated read() calls returning 0. It should 
mean end of file, thus there should be no point in insisting. For 
EINTR, read should return a negative number, and strace should report 
that. (My strace looks quite similar to your "normal operation" one, 
however I have pread rather than read, and no lseek, with bdb4.4)

Good luck















































------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
courier-users mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to