Hello, again,

It seems like many people on this list are having network issues. I
thought that I would do my part to add to the confusion.

We are running 20 Red Hat 7.2 systems with a patched vanilla kernel on a
single hipersockets guestlan. We recently migrated all of our linux
images from z/VM 4.2 to 4.4. The transition went smoothly, but we
noticed after a few days that images began to experience slight
connection problems as reported by our Big Brother monitor. The
notifications that BB sent indicated that the machine would reply on the
second or third attempt, so we suspected it could be a product of our
images finally dropping from Q3 into dormancy and subsequently taking
longer to reply. Unfortunately, we soon noticed that these  connection
errors were becoming more pronounced on random images. Keypresses during
interactive ssh session would take from 2-8 secs. to be echoed back to
the screen. We were able to work around this by stopping the network,
rmmoding the qeth and qdio mods then bringing the network back up.

In the last few days, all of our images have begun to exhibit this
behavior and bouncing the network on an image no longer makes much of a
difference. We are seeing pings within the guestlan where 30-40% of the
traffic is being dropped. With the idea that this could be an artifact
of older OCO modules, I moved a subset of our images to the 2.4.21
kernel with the GPL'd network drivers, but that had no discernible
effect. My colleague has been searching for relevant apars but has not
found any so far.

Our guestlan is set to infinite connections, "q osa" & "q nic" return
normal results and I haven't found any descriptive errors on the linux
side, yet. I have seen the following error in a few images, but not
across the board:

qdio : sense data available on qdio channel.
qdio : irb: 01 c2 60 17  00 fc 21 38  0e 00 00 00  00 80 00 00
qdio : irb: 01 20 00 00  00 00 00 00  00 00 00 00  00 00 00 00
qdio : sense data: 80 00 1f 07  00 00 00 00  00 00 00 00  00 00 00 00
qdio : sense data: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
qdio : received check condition on activate queues on irq 0x3 (cs=x0,
ds=xe).
 qeth: activate queues on irq 0x3: dstat=0xe, cstat=0x0
 qeth: recovery was scheduled on irq 0x1 (hsi0) with problem 0x3
qdio : Did not get interrupt on halt_IO, irq=0x3.

I think that I've seen this error, previously, when everything was
working fine, so I don't consider it significant. Hopefully, someone can
confirm or deny that. Any ideas?

--
Michael Lambert <[EMAIL PROTECTED]>

Reply via email to