Tom <uebersh...@googlemail.com> posted 20090326182608.5da93...@viciousvincent, excerpted below, on Thu, 26 Mar 2009 18:26:08 +0100:
> I've upgraded to the 2.6.29-gentoo sources. I've build everything as > usual, and sofar, everything seems to be working. > Except that my network device 'dies' (not permanently) after working > flawlessly for maybe 10min. > > Booting a 2.6.28 kernel, I have no such issues. Restarting > /etc/init.d/net.eth0 has no effect, and using ifconfig up/down eth0 just > times out. > > The drivers are all there as they should be, could this be somekind of > weird regression? I'm using the Uli M526x driver, found under the > 'tulip-family' This is in fact a mainline regression, due to one of the last patches before the release that changed NAPI handling but apparently has interrupt implications as well. The LKML 2.6.29 announcement had a reply mentioning the regression and several confirmations, then discussion as they try to pin it down with various patches and repeated tests. They intend a fix for 2.6.29.1, even if it's simply reverting the late patch. However, that patch was itself a fix for a problem on other NICs, and other code intended to revert the effects of the patch still ends up tickling the interrupt problem so it's a bit more complex than they anticipated. But the normal rule is no breaking previously working hardware so had that patch made it even a day earlier it would have likely been reverted before release, and if they can't find a better solution, it almost certainly /will/ be reverted for .29.1. That was one of two subthreads generated by the announcement. The other one was related to the temporarily fixed for .29 ext4 data corruption bug that made big news in the -rc period. They did a temp fix for .29. Now that it's out, they're trying to come up with a more permanent solution, but there's a policy debate in the process, as to whether the (lack of) data stability guarantees in POSIX in the event of an improper shutdown is acceptable or not. The one side says POSIX doesn't require more and that the default data=ordered stability of ext3 was an "accident", while the other says that may be, but now that the stability expectation has been raised, changing it in the interest of "performance" isn't a good thing. The other bit of the debate is just how "ordered" data=ordered has to be. The performance side says if metadata is synced every five seconds (the default) while data is only synced every 30 seconds (again the default) with delayed allocation, and a crash causes loss of data, tough, it's POSIX compliant and the performance benefits are great. The other side says data=ordered means data=ordered, that metadata MUST wait to sync until after the data it covers is synced in data=ordered mode (the default), REGARDLESS of delayed allocation, even if the cost is loss of some of the vaunted performance gains of ext4 over ext3. Basically what the latter one boils down to for me and many others is that despite the rename of ext4dev to ext4, supposedly indicating it's stable now, it's NOT, at least not enough for mission critical data that in real life may or may not have up-to-date backups! Ext3 (or for me reiserfs in the same data=ordered default mode) continues to work well, and it's not time to go moving everything to ext4 just yet. Find the announcement thread on any LKML mirror, or covered in some kernel news discussions, for more. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman