|
Dear Sir or Madam:
In an e-mail from a SUN Administrator, he had guessed
that our problem was caused by the AFS
So, we want your opinion on our problem, and
that is why our Web server is crashing regularly.
I have attached the email from the SUN Administrator in the
following:
attached :
Customers is facing a hang situation daily as AFS was dropping
packets because
the outgoing socket buffer was full. The machine on which this problem happens is a webserver and all that it does is fetch files from AFS. So, the incoming traffic into this machine is much much higher than its outgoing traffic. The threadlist reveals the following information. The offending thread is blocked in tli_send while trying to send out a packet. ============== thread_id 60f7a420 0x60e086e0: process args= ./ns-httpd -d /www/w3-l/https-comcentral/config 0x60f7a420: lwp proc wchan 60f818f8 60e086e0 616fa730 0x60f7a454: sp pc 30642780 cv_wait+0x40 ?(?) + 0 cv_wait(0x616fa730,0x616fa710,0x1,0x100,0x60615778,0x60747dc8) entersq(0x616fa710,0x616fa730,0x2000,0xf000,0x1,0x2200) + 74 runservice(0x616fa6e8,0x616fa6b4,0x2200,0x20000,0x6004c0f8,0x616e88ec) + 1c queuerun(0x616fa6b4,0x10437a08,0x10437c25,0x104259cc,0x10425800,0x10425800) + 17 0 strput(0x0,0x617ff140,0x6018e164,0x44,0x0,0x0) + 238 kstrputmsg(0x6080bf5c,0x617ff7a0,0x40,0x40,0x0,0x6018e164) + 2d0 tli_send(0x600b0a00,0x617ff7a0,0x7,0x0,0x35461d91,0x60728000) + 20 t_ksndudata(?) + 208 lm_nlm4_reclaim(0x600b0a00,0x30642bc0,0x61393520,0x600918b0,0x616e89f8, 0x30642bd 8) + 140 osi_NetSend(0x600b0a00,0x30642c64,0x6093e2c4,0x2,0x34,0x60191450) + 1c0 rxi_SendPacket(0x605e3c78,0x6093e280,0x0,0x3b9aca00,0x35461d91,0x6099af50) + 170 rxi_Send(0x601caa00,0x6093e280,0x0,0x68e6c,0x35461d91,0x605e3c78) + a4 rxi_Start(0x0,0x601caa00,0x0,0x0,0x6093e280,0x6093e280) + 91c rxi_FlushWrite(0x601caa00,0x60f7a420,0x20,0x60875cb0,0x590,0x6093e280) + 184 rxi_ReadProc(0x601caa00,0x30642f28,0x4,0x1,0x4,0x601caa00) + bc rx_ReadProc(0x601caa00,0x30642f28,0x4,0x3b9aca00,0x35461d91,0x60728000) + 3c afs_UFSCacheFetchProc(0x601caa00,0x613e1020,0x10000,0x60ae91d0,0x60728000, 0x3064 2fd0) + 8c afs_GetDCache(0x609f21e0,0x0,0x306430bc,0x0,0x0,0x35461d91) + 2974 afs_GetOnePage(0x609f21e0,0x0,0x10000,0x609f21e0,0x0,0x30643274) + 1f4 afs_getpage(0x6107d9a0,0xed820000,0x1,0x2000,0x30643260,0x0) + 104 segvn_fault(0x600bba68,0x10000,0xed822000,0x0,0x2000,0x1) + 77c as_fault(0x2000,0x60021590,0x6107d9a0,0x60f818f8,0x0,0x1) + 3cc pagefault(0xed820000,0x0,0x1,0x0,0xed820000,0x1) + 40 This problem was seen on a Transarc's AFS used by the customer. I have the following information from Transarc. AFS uses tli_send() to send out a UDP packet and it sets the FNDELAY flag when appropriate so that send operations never get blocked on the interrupt stack. I am appending three functions of code. The first function is called osi_NewSocket() : it allocates the socket and set the socket buffer size. The second function is called osi_NetSend(): it sends a UDP packet. It calls t_ksndudata1() which, in turn, calls tli_send with the FNDELAY flag if it is on the interrupt stack. All this code corresponds to the core file which i have attached . We would appreciate your comments and thoughts on this
matter. We seek a quick and efficient
resolution on our webserver problem.
Looking forward to your response,
===============================================================================
���װ������б� �ý��ۿ�� (Pohang University Of Science and Technology,Computer Systems
Management Team)
�̿�� (Lee,Yong-Kyu) Tel. : 82-54-279-2567 Fax. : 82-54-279-2599 Email : [EMAIL PROTECTED] =============================================================================== |
