22.11.2015 07:17, Alexander Duyck wrote:
On 11/21/2015 12:16 AM, Andrew wrote:
Memory corruption, if happens, IMHO shouldn't be a hardware-related -
almost all of these boxes, except H61M-based box from 1st log, works
for a long time with uptime more than year; and only software was
changed on it; H61M-based box runs memtest86 for a tens of hours w/o
any error. If it was caused by hardware - they should crash even
earlier.

I wasn't saying it was hardware related.  My thought is that it could
be some sort of use after free or double free type issue. Basically
what you end up with is the memory getting corrupted by software that
is accessing regions it shouldn't be.

Rarely on different servers I saw 'zram decompression error' messages
(in this case I've got such message on H61M-based box).

Also, other people that uses accel-ppp as BRAS software, have
different kernel panics/bugs/oopses on fresh kernels.

I'll try to apply these patches, and I'll try to switch back to
kernels that were stable on some boxes.

If you could bisect this it would be useful.  Basically we just need
to determine where in the git history these issues started popping up
so that we can then narrow down on the root cause.

- Alex
IMHO bisecting will be too long, because these crashes aren't regular - once box may work for a month w/o troubles, and then - may crash twice per week with same load.

Maybe if I'll create 10-20k sessions in test environment, this will cause crash - but I'm not sure about this. I'll try to check this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to