Thanks for the quick suggestions

On Tuesday, November 26, 2002, at 12:11 PM, MonMotha wrote:
A great thing to do would be run memtest86 on the system, especially if you have thigns randomly segfaulting. Bad memory can be a tricky thing to spot and diagnose.

I gave it about 4 days of this before coming online. Should I have perhaps run it longer, maybe a month, or do you think that the test would uncover bad memory within a few days?


Another thing that happened to someone recently was the motherboard not setting the voltage correctlty with AUTO. Forcing the voltage to that in the spec sheets fixed his problems.

This will be investigated. It seems like I couldn't get 30 days with bad voltage, but, perhaps this ultimately leads to suggestion 3, thermal shutdown. I'll check.


This definately sounds like a hardware issue (possibly thermal shutdown?). Normally the kernel manages to at least throw up an Oops on hardware failure, but occasionally hard locks are the result. If you can find something that reliably triggers the problem, you can go a great way to diagnosing the cause. Another possibility if it is software is a problem in an interrupt handler or some other situation where the kernel can't be interrupted but control is never returned to the kernel by a driver.

I have theorized that my realtek ethernet chipset may be substandard for this application. A freebsd friend pointed out that the author of the realtek driver for Freebsd made a few very negative comments about the quality of the chipset in his man pages. He makes these two comments:

"Since outbound packets must be longword aligned, the transmit routine has to copy an unaligned packet into an mbuf cluster buffer before transmis- sion. The driver abuses the fact that the cluster buffer pool is allo- cated at system startup time in a contiguous region starting at a page
     boundary.  Since cluster buffers are 2048 bytes, they are longword
aligned by definition. The driver probably should not be depending on
     this characteristic.

The RealTek data sheets are of especially poor quality: the grammar and spelling are awful and there is a lot of information missing, particu- larly concerning the receiver operation. One particularly important fact that the data sheets fail to mention relates to the way in which the chip fills in the receive buffer. When an interrupt is posted to signal that a frame has been received, it is possible that another frame might be in the process of being copied into the receive buffer while the driver is busy handling the first one. If the driver manages to finish processing
     the first frame before the chip is done DMAing the rest of the next
frame, the driver may attempt to process the next frame in the buffer
     before the chip has had a chance to finish DMAing all of it."

The rl driver was written by Bill Paul <[EMAIL PROTECTED]>.

In your opinion, could this lead to a lock-down, and does realtek have that bad of a reputation in the Linux community? It sounds pretty bad to me.

thanks again for your thoughts

scott

Reply via email to