hi

try to move

tval.tv_sec             = 300;
tval.tv_usec            = 0;
into

while(1){
....
}
before select()

Ramil.

Vanja Hrustic wrote:

> Hi!
> 
> I have noticed that cachelogd will start consuming 100% of CPU time *exactly* 5 
>minutes after the daemon is started. Gdb showed that reason for the load is select() 
>loop.
> 
> In source code, I could see:
> 
> ----- cachelogd.c
> <...code...>
> tval.tv_sec             = 300;
> tval.tv_usec            = 0;
> <...code...>
> while (1) {
>       fd_set msk;
>               
>       msk=mask;
>       sel=select(32,&msk,0,0,&tval);
>               
>       if(sel==0){
>               /* Time limit expired */
>                       continue;
>       }
>       if(sel<1){
>               /* FIXME: add errno checking */
>               continue;
>       }
>               
>       /* Check whether new client has connected */
>       if(FD_ISSET(ctl_sock,&msk)){
>               int cl,addrlen;
>               addrlen = sizeof(his_addr);
>               for(cl=0;cl<MAXCLIENT;cl++){
>                       if(log_client[cl].fd==0){
>                               fprintf(stderr,"%s Client #%d 
>connected\n",time_pid_info(),cl);
>                               log_client[cl].fd=accept(ctl_sock, (struct sockaddr 
>*)&his_addr, &addrlen);
>                               FD_SET(log_client[cl].fd,&mask);
>                               log_client[cl].state=STATE_CMD;
>                               log_client[cl].rbytes=sizeof(UDM_LOGD_CMD);
>                               break;
>                       }
>               }
>       }
> <code...>
> -----
> 
> Now... I might be talking complete rubish, but I hope someone will correct me :)
> 
> >From what I could find, there are 2 'ways' to use select()/accept(). One way is to 
>accept(), then use select() later - select() has a timeout, and if nothing happens 
>during that timeout period on a socket, select() returns 0
> and some action can be performed (close the socket, or whatever - depending on 
>needs).
> 
> In another situation, select() is used first, and accept() later (as in 
>cachelogd.c). But, select() is called with timeout NULL, which makes it 'block' until 
>some input comes in.
> 
> What happens right now in cachelogd (as much as I can see, but I'm not a programmer 
>by 'definition', so... ;) is that cachelogd will be ok for 5 minutes (while select() 
>is actually sleeping), but once the timer reaches 0,
> select() will start the flood. It can be checked in gdb as well. Something like:
> 
> -----
> [root@emx sbin]# gdb ./cachelogd 
> <loading...>
> (gdb) b 416
> Breakpoint 1 at 0x80494a7: file cachelogd.c, line 416.
> (gdb) r
> Starting program: /opt/mnogosearch/sbin/./cachelogd 
> Wed 21 16:49:59 [21785] Open logs 0 0
> Wed 21 16:49:59 [21785] cachelogd started. Accepting 128 connections.
> 
> Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
> warning: Source file is more recent than executable.
> 
> 416                     sel=select(32,&msk,0,0,&tval);
> [This select() is the 1st one that gets executed, and tval.tv_sec is 300 at this 
>point.]
> (gdb) c
> Continuing.
> 
> [exactly 300 seconds later...]
> 
> Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
> 416                     sel=select(32,&msk,0,0,&tval);
> (gdb) p tval.tv_sec
> $1 = 0
> (gdb) c
> Continuing.
> 
> Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
> 416                     sel=select(32,&msk,0,0,&tval);
> (gdb) c
> Continuing.
> 
> Breakpoint 1, main (argc=1, argv=0xbffffce4) at cachelogd.c:416
> 416                     sel=select(32,&msk,0,0,&tval);
> (gdb) c
> Continuing.
> 
> etc... (repeats forever)
> 
> -----
> 
> I can think of 3 possible ways to fix this. But I would *really* appreciate if 
>someone with more 'socket experience' gives the proper fix and possibly explains the 
>real issue here :)
> 
> 1. Do something like:
> 
>       if(sel==0){
>               tval.tv_sec = 300; /* reset the timer when it reaches 0 */
>               /* Time limit expired */
>               continue;
>       }
> 
> In this case, timer will get reset every time it reaches 0. Seems to work ok, no 
>'side-effects' noticed (tried for 30 mins and re-indexed few thousand pages)
> 
> 2. Do something like:
> 
> instead of:
>               sel=select(32,&msk,0,0,&tval);
> use:
>               sel=select(32,&msk,NULL,NULL,(struct timeval *)NULL); /* I prefer NULL 
>over 0 - just for 'aesthetic' purposes, sorry :) */
> 
> This *should* make select() "block" until there is actually something it can deal 
>with (new connection, etc). Seems to work ok, no 'side-effects' noticed (still 
>running, re-indexing 10,000 pages)
> 
> 3. Rewrite this part using accept(), and then select()
> 
> Don't think it's really needed :)
> 
> 
> 
> I hope there is someone more experienced to check this out :)
> 
> Thanks.
> 

___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to