在 2013-1-8,20:52,Vyacheslav Dubeyko <[email protected]> 写道:
> Hi guys, > > I am trying to reproduce the issue last three days but without success. I > tried different workloads and different environments. As I know all of you > have the issue in reproduced state. So I have additional questions. > > 1. All of you have such messages: > > Jan 03 22:36:38 [kernel] [ 953.289973] NILFS: bad btree node > (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40 > Jan 03 22:36:38 [kernel] [ 953.289976] NILFS error (device sda2): > nilfs_bmap_lookup_contig: broken bmap (inode number=102230) > > As I understand, you still have message for concrete block number (for > example, blocknr=26229286) during remount. But you haven't the message for > this block number (for example, blocknr=26229286) after umount and mount > again. But you can get error messages for another block number after it. Am I > correct? I got error messages for the same block after remount. That is to say the bad block is always bad. > 2. As I understand, you have corrupted file on your volume after such error > message (for example, for inode number=102230). > Yes. Once I open the corrupted file and read, kernel will report bad btree node and remount filesystem read-only. > 在 2013-1-6,12:46,Elmer Zhang <[email protected]> 写道: > >> I have found the corrupted file using inode number: >> [root@yf237 data0]# cat mysql6003/app_wyxgrab/weibo_rank.MYI > /dev/null >> cat: mysql6003/app_wyxgrab/weibo_rank.MYI: Input/output error > > Could you share strace output for "cat" command for such corrupted file? > Maybe syslog can contain some interesting details during execution of "cat" > command. Could you check syslog for interesting error messages during such > try? > output of strace for cat: http://d.pr/n/Qboc error messages during cat: http://d.pr/n/snOt > 3. Could you share configuration file of your kernel (.config)? I suspect > that you can have some special configuration of your environment that I > haven't. > content of /boot/config-2.6.32-220.13.1.el6.x86_64 : http://d.pr/n/qTQk > 4. Could you share content of nilfs_cleanerd.conf file for NILFS2 partition > that has such issue? Sorry, if I ask about it again. > content of nilfs_cleanerd.conf: http://d.pr/n/YIwj > 5. Did you have any sudden power-off before you encounter the issue firstly? > No. > 6. I understand that it can be not so easy. But, anyway, could you share > details of your system log for the case of first case of the issue > occurrence? I need only details about how live system before the issue. > I found some backtrace in syslog: http://d.pr/n/ddZd > 7. I analyzed the raw dump of segment that I received from Elmer Zhang. > Currently, I have such feeling that it takes place situation when driver > tries to take block that was filled by GC yet. But it needs to investigate > the issue more deeply. And, currently, I don't understand how the issue can > be achieved. Successful reproducing of the issue is a half of the success. > > Thanks, > Vyacheslav Dubeyko. > --- Elmer Zhang -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
