Possible select()/accept() issue in cachelogd.c

Vanja Hrustic Wed, 21 Feb 2001 03:00:02 -0800
Hi!

I have noticed that cachelogd will start consuming 100% of CPU time *exactly* 5 
minutes after the daemon is started. Gdb showed that reason for the load is select() 
loop.

In source code, I could see:

----- cachelogd.c
<...code...>
tval.tv_sec             = 300;
tval.tv_usec            = 0;
<...code...>
while (1) {
        fd_set msk;
                
        msk=mask;
        sel=select(32,&msk,0,0,&tval);
                
        if(sel==0){
                /* Time limit expired */
                        continue;
        }
        if(sel<1){
                /* FIXME: add errno checking */
                continue;
        }
                
        /* Check whether new client has connected */
        if(FD_ISSET(ctl_sock,&msk)){
                int cl,addrlen;
                addrlen = sizeof(his_addr);
                for(cl=0;cl<MAXCLIENT;cl++){
                        if(log_client[cl].fd==0){
                                fprintf(stderr,"%s Client #%d 
connected\n",time_pid_info(),cl);
                                log_client[cl].fd=accept(ctl_sock, (struct sockaddr 
*)&his_addr, &addrlen);
                                FD_SET(log_client[cl].fd,&mask);
                                log_client[cl].state=STATE_CMD;
                                log_client[cl].rbytes=sizeof(UDM_LOGD_CMD);
                                break;
                        }
                }
        }
<code...>
-----

Now... I might be talking complete rubish, but I hope someone will correct me :)

>From what I could find, there are 2 'ways' to use select()/accept(). One way is to 
>accept(), then use select() later - select() has a timeout, and if nothing happens 
>during that timeout period on a socket, select() returns 0
and some action can be performed (close the socket, or whatever - depending on needs).

In another situation, select() is used first, and accept() later (as in cachelogd.c). 
But, select() is called with timeout NULL, which makes it 'block' until some input 
comes in.

What happens right now in cachelogd (as much as I can see, but I'm not a programmer by 
'definition', so... ;) is that cachelogd will be ok for 5 minutes (while select() is 
actually sleeping), but once the timer reaches 0,
select() will start the flood. It can be checked in gdb as well. Something like:

-----
[root@emx sbin]# gdb ./cachelogd 
<loading...>
(gdb) b 416
Breakpoint 1 at 0x80494a7: file cachelogd.c, line 416.
(gdb) r
Starting program: /opt/mnogosearch/sbin/./cachelogd 
Wed 21 16:49:59 [21785] Open logs 0 0
Wed 21 16:49:59 [21785] cachelogd started. Accepting 128 connections.

Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
warning: Source file is more recent than executable.

416                     sel=select(32,&msk,0,0,&tval);
[This select() is the 1st one that gets executed, and tval.tv_sec is 300 at this 
point.]
(gdb) c
Continuing.

[exactly 300 seconds later...]

Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
416                     sel=select(32,&msk,0,0,&tval);
(gdb) p tval.tv_sec
$1 = 0
(gdb) c
Continuing.

Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
416                     sel=select(32,&msk,0,0,&tval);
(gdb) c
Continuing.

Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
416                     sel=select(32,&msk,0,0,&tval);
(gdb) c
Continuing.

etc... (repeats forever)

-----

I can think of 3 possible ways to fix this. But I would *really* appreciate if someone 
with more 'socket experience' gives the proper fix and possibly explains the real 
issue here :)

1. Do something like:

        if(sel==0){
                tval.tv_sec = 300; /* reset the timer when it reaches 0 */
                /* Time limit expired */
                continue;
        }

In this case, timer will get reset every time it reaches 0. Seems to work ok, no 
'side-effects' noticed (tried for 30 mins and re-indexed few thousand pages)

2. Do something like:

instead of:
              sel=select(32,&msk,0,0,&tval);
use:
              sel=select(32,&msk,NULL,NULL,(struct timeval *)NULL); /* I prefer NULL 
over 0 - just for 'aesthetic' purposes, sorry :) */

This *should* make select() "block" until there is actually something it can deal with 
(new connection, etc). Seems to work ok, no 'side-effects' noticed (still running, 
re-indexing 10,000 pages)

3. Rewrite this part using accept(), and then select()

Don't think it's really needed :)



I hope there is someone more experienced to check this out :)

Thanks.

-- 

Vanja Hrustic
The Relay Group
http://relaygroup.com
Technology Ahead of Time
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]
Possible select()/accept() issue in cachelogd.c

Reply via email to