Dear Sir or Madam:
 
In  an e-mail from a SUN Administrator, he had guessed that our problem was caused by the AFS
So, we want your opinion on our problem, and that is why our Web server is crashing regularly.
I have attached the email from the SUN Administrator in the following:
 
 
attached :
 
Customers is facing a hang situation daily as AFS was dropping packets because
the outgoing  socket buffer was full. 
The machine on which this problem happens is a webserver and all that it does
is fetch files from AFS. So, the incoming traffic into this machine is much
much higher than its outgoing traffic.

The threadlist reveals the following information.
The offending thread is blocked in tli_send while trying to send out
a packet.

                ============== thread_id        60f7a420
0x60e086e0:     process args=   ./ns-httpd -d /www/w3-l/https-comcentral/config
0x60f7a420:     lwp             proc            wchan
                60f818f8        60e086e0        616fa730
0x60f7a454:     sp              pc
                30642780        cv_wait+0x40
?(?) + 0
cv_wait(0x616fa730,0x616fa710,0x1,0x100,0x60615778,0x60747dc8)
entersq(0x616fa710,0x616fa730,0x2000,0xf000,0x1,0x2200) + 74
runservice(0x616fa6e8,0x616fa6b4,0x2200,0x20000,0x6004c0f8,0x616e88ec) + 1c
queuerun(0x616fa6b4,0x10437a08,0x10437c25,0x104259cc,0x10425800,0x10425800) +
17
0
strput(0x0,0x617ff140,0x6018e164,0x44,0x0,0x0) + 238
kstrputmsg(0x6080bf5c,0x617ff7a0,0x40,0x40,0x0,0x6018e164) + 2d0
tli_send(0x600b0a00,0x617ff7a0,0x7,0x0,0x35461d91,0x60728000) + 20
t_ksndudata(?) + 208
lm_nlm4_reclaim(0x600b0a00,0x30642bc0,0x61393520,0x600918b0,0x616e89f8,
0x30642bd
8) + 140
osi_NetSend(0x600b0a00,0x30642c64,0x6093e2c4,0x2,0x34,0x60191450) + 1c0
rxi_SendPacket(0x605e3c78,0x6093e280,0x0,0x3b9aca00,0x35461d91,0x6099af50) +
170
rxi_Send(0x601caa00,0x6093e280,0x0,0x68e6c,0x35461d91,0x605e3c78) + a4
rxi_Start(0x0,0x601caa00,0x0,0x0,0x6093e280,0x6093e280) + 91c
rxi_FlushWrite(0x601caa00,0x60f7a420,0x20,0x60875cb0,0x590,0x6093e280) + 184
rxi_ReadProc(0x601caa00,0x30642f28,0x4,0x1,0x4,0x601caa00) + bc
rx_ReadProc(0x601caa00,0x30642f28,0x4,0x3b9aca00,0x35461d91,0x60728000) + 3c
afs_UFSCacheFetchProc(0x601caa00,0x613e1020,0x10000,0x60ae91d0,0x60728000,
0x3064
2fd0) + 8c
afs_GetDCache(0x609f21e0,0x0,0x306430bc,0x0,0x0,0x35461d91) + 2974
afs_GetOnePage(0x609f21e0,0x0,0x10000,0x609f21e0,0x0,0x30643274) + 1f4
afs_getpage(0x6107d9a0,0xed820000,0x1,0x2000,0x30643260,0x0) + 104
segvn_fault(0x600bba68,0x10000,0xed822000,0x0,0x2000,0x1) + 77c
as_fault(0x2000,0x60021590,0x6107d9a0,0x60f818f8,0x0,0x1) + 3cc
pagefault(0xed820000,0x0,0x1,0x0,0xed820000,0x1) + 40


This problem was seen on a Transarc's AFS used by the customer. I have the
following information from Transarc.

AFS uses  tli_send() to send out a UDP packet and it sets the FNDELAY flag
when appropriate so that send operations never get blocked on the interrupt
stack.

I am appending three functions of code. The first function is called
osi_NewSocket() : it allocates the socket and set the socket buffer size. The
second function is called osi_NetSend(): it sends a UDP packet. It calls
t_ksndudata1() which, in turn, calls tli_send with the FNDELAY flag if it is on
the interrupt stack. All this code corresponds  to the core file which i have
attached .
 
We would appreciate your comments and thoughts on this matter.  We seek a quick and efficient
resolution on our webserver problem.
 
Looking forward to your response,
 
===============================================================================
���װ������б� �ý��ۿ��
(Pohang University Of Science and Technology,Computer Systems Management Team)
�̿��
(Lee,Yong-Kyu)
Tel.    : 82-54-279-2567
Fax.   : 82-54-279-2599
Email :
[EMAIL PROTECTED]
===============================================================================

Reply via email to