在 2013-1-8,20:52,Vyacheslav Dubeyko <[email protected]> 写道:

> Hi guys,
> 
> I am trying to reproduce the issue last three days but without success. I 
> tried different workloads and different environments. As I know all of you 
> have the issue in reproduced state. So I have additional questions.
> 
> 1. All of you have such messages:
> 
> Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node 
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2): 
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> 
> As I understand, you still have message for concrete block number (for 
> example, blocknr=26229286) during remount. But you haven't the message for 
> this block number (for example, blocknr=26229286) after umount and mount 
> again. But you can get error messages for another block number after it. Am I 
> correct?

I got error messages for the same block after remount. That is to say the bad 
block is always bad.

> 2. As I understand, you have corrupted file on your volume after such error 
> message (for example, for inode number=102230).
> 

Yes. Once I open the corrupted file and read, kernel will report bad btree node 
and remount filesystem read-only.

> 在 2013-1-6,12:46,Elmer Zhang <[email protected]> 写道:
> 
>> I have found the corrupted file using inode number:
>> [root@yf237 data0]# cat mysql6003/app_wyxgrab/weibo_rank.MYI > /dev/null 
>> cat: mysql6003/app_wyxgrab/weibo_rank.MYI: Input/output error
> 
> Could you share strace output for "cat" command for such corrupted file? 
> Maybe syslog can contain some interesting details during execution of "cat" 
> command. Could you check syslog for interesting error messages during such 
> try?
> 

output of strace for cat: http://d.pr/n/Qboc
error messages during cat: http://d.pr/n/snOt

> 3. Could you share configuration file of your kernel (.config)? I suspect 
> that you can have some special configuration of your environment that I 
> haven't.
> 

content of /boot/config-2.6.32-220.13.1.el6.x86_64 : http://d.pr/n/qTQk

> 4. Could you share content of nilfs_cleanerd.conf file for NILFS2 partition 
> that has such issue? Sorry, if I ask about it again.
> 

content of nilfs_cleanerd.conf: http://d.pr/n/YIwj

> 5. Did you have any sudden power-off before you encounter the issue firstly?
> 

No.

> 6. I understand that it can be not so easy. But, anyway, could you share 
> details of your system log for the case of first case of the issue 
> occurrence? I need only details about how live system before the issue.
> 

I found some backtrace in syslog: http://d.pr/n/ddZd

> 7. I analyzed the raw dump of segment that I received from Elmer Zhang. 
> Currently, I have such feeling that it takes place situation when driver 
> tries to take block that was filled by GC yet. But it needs to investigate 
> the issue more deeply. And, currently, I don't understand how the issue can 
> be achieved. Successful reproducing of the issue is a half of the success.
> 
> Thanks,
> Vyacheslav Dubeyko.
> 

---
Elmer Zhang

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to