For several years I have been using rsync with the --link-dest option
in scripts to make incremental backups using hard links. I have been
testing NILFS2 snapshots as an alternative way to do equivalent
backups. Some of my experiments have involved writing to a DVD-RAM. I
am reporting what happened when a DVD-RAM had non-recoverable write
errors in case that information is of interest to the developers.
My DVD-RAM was written on a 64-bit Ubuntu Karmic system with a 2.6.31
kernel. The NILFS2 module came with that kernel; the NILFS2 utilities
were built from the head of NILFS2 git repository on April 8, 2010.
Here are the first syslog messages associated with the write errors:
Apr 11 20:46:28 x2 kernel: [496491.160053] usb 1-8: reset high speed
USB device using ehci_hcd and address 5
Apr 11 20:46:35 x2 kernel: [496497.963699] sr 7:0:0:0: [sr1] Unhandled
sense code
Apr 11 20:46:35 x2 kernel: [496497.963708] sr 7:0:0:0: [sr1] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 11 20:46:35 x2 kernel: [496497.963716] sr 7:0:0:0: [sr1] Sense Key
: Medium Error [current]
Apr 11 20:46:35 x2 kernel: [496497.963725] sr 7:0:0:0: [sr1] Add.
Sense: Incomplete block read
Apr 11 20:46:35 x2 kernel: [496497.963735] end_request: I/O error, dev
sr1, sector 8778160
Apr 11 20:46:54 x2 kernel: [496516.494583] sr 7:0:0:0: [sr1] Unhandled
sense code
Apr 11 20:46:54 x2 kernel: [496516.494592] sr 7:0:0:0: [sr1] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 11 20:46:54 x2 kernel: [496516.494600] sr 7:0:0:0: [sr1] Sense Key
: Medium Error [current]
Apr 11 20:46:54 x2 kernel: [496516.494608] sr 7:0:0:0: [sr1] Add.
Sense: No defect spare location available
Apr 11 20:46:54 x2 kernel: [496516.494621] end_request: I/O error, dev
sr1, sector 8781280
Apr 11 20:46:54 x2 kernel: [496516.502194] NILFS: IO error writing segment
Apr 11 20:46:54 x2 kernel: [496516.511162] NILFS warning (device sr1):
nilfs_clean_segments: segment construction failed. (err=-5)
When I detected the errors, I shut down the system and waited until
the following morning to investigate. When I tried to examine the
files on the DVD-RAM to see which ones were corrupted, the following
error was generated:
Apr 12 10:50:42 x2 kernel: [507977.110489] ------------[ cut here ]------------
Apr 12 10:50:42 x2 kernel: [507977.110502] kernel BUG at
/build/buildd/linux-2.6.31/fs/nilfs2/btree.c:514!
Apr 12 10:50:42 x2 kernel: [507977.110509] invalid opcode: 0000 [#1] SMP
Apr 12 10:50:42 x2 kernel: [507977.110516] last sysfs file:
/sys/devices/pci0000:00/0000:00:02.1/usb1/1-1/idVendor
Apr 12 10:50:42 x2 kernel: [507977.110521] CPU 1
Apr 12 10:50:42 x2 kernel: [507977.110526] Modules linked in:
usb_storage udf crc_itu_t nilfs2 binfmt_misc snd_hda_codec_idt ppdev
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
snd_pcm iptable_filter snd_seq_dummy ip_tables x_tables snd_seq_oss
snd_seq_midi parport_pc nvidia(P) psmouse serio_raw snd_rawmidi
snd_seq_midi_event snd_seq snd_timer snd_seq_device snd amd64_edac_mod
i2c_nforce2 edac_core soundcore snd_page_alloc k8temp lp parport
floppy 8139too mii
Apr 12 10:50:42 x2 kernel: [507977.110589] Pid: 24330, comm: rsync
Tainted: P W 2.6.31-20-generic #57-Ubuntu NFORCE6M-A
Apr 12 10:50:42 x2 kernel: [507977.110595] RIP:
0010:[<ffffffffa0b8d6a1>] [<ffffffffa0b8d6a1>]
nilfs_btree_do_lookup+0x291/0x2a0 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110631] RSP: 0018:ffff88008900f748
EFLAGS: 00010293
Apr 12 10:50:42 x2 kernel: [507977.110635] RAX: 000000000000003d RBX:
ffff880086949c80 RCX: 0000000000000000
Apr 12 10:50:42 x2 kernel: [507977.110641] RDX: 0000000000000000 RSI:
ffff88008900f6a8 RDI: ffff88002800cb20
Apr 12 10:50:42 x2 kernel: [507977.110646] RBP: ffff88008900f7b8 R08:
0000000000000001 R09: 0001ce00a2bfa0a0
Apr 12 10:50:42 x2 kernel: [507977.110651] R10: 00000000ffffffff R11:
0000000000000000 R12: ffff8800731ff000
Apr 12 10:50:42 x2 kernel: [507977.110656] R13: 0000000000000001 R14:
ffff8800ba8ace98 R15: 0000000000000000
Apr 12 10:50:42 x2 kernel: [507977.110663] FS: 00007f82523ed6f0(0000)
GS:ffff88002803d000(0000) knlGS:0000000000000000
Apr 12 10:50:42 x2 kernel: [507977.110668] CS: 0010 DS: 0000 ES: 0000
CR0: 0000000080050033
Apr 12 10:50:42 x2 kernel: [507977.110673] CR2: 0000000001ab6ff0 CR3:
00000000196c1000 CR4: 00000000000006e0
Apr 12 10:50:42 x2 kernel: [507977.110679] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Apr 12 10:50:42 x2 kernel: [507977.110684] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Apr 12 10:50:42 x2 kernel: [507977.110690] Process rsync (pid: 24330,
threadinfo ffff88008900e000, task ffff88010d82c410)
Apr 12 10:50:42 x2 kernel: [507977.110695] Stack:
Apr 12 10:50:42 x2 kernel: [507977.110697] ffff880000000001
0000000100000001 ffff880086949c00 000000018900f830
Apr 12 10:50:42 x2 kernel: [507977.110705] <0> ffff88008900f810
0000000000000000 ffff88008900f7a8 00000000a0b8dd90
Apr 12 10:50:42 x2 kernel: [507977.110714] <0> 0000000000000001
ffff8800ba8ace98 ffff880086949c00 0000000000000000
Apr 12 10:50:42 x2 kernel: [507977.110724] Call Trace:
Apr 12 10:50:42 x2 kernel: [507977.110746] [<ffffffffa0b8ddc5>]
nilfs_btree_lookup_contig+0x65/0x330 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110760] [<ffffffff8152cea0>] ?
_spin_lock_irq+0x10/0x20
Apr 12 10:50:42 x2 kernel: [507977.110778] [<ffffffffa0b8b6a8>]
nilfs_bmap_lookup_contig+0x48/0x70 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110795] [<ffffffffa0b83ff0>]
nilfs_get_block+0xb0/0x250 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110803] [<ffffffff8152cf59>] ?
_spin_lock+0x9/0x10
Apr 12 10:50:42 x2 kernel: [507977.110814] [<ffffffff8114f088>]
do_mpage_readpage+0x148/0x600
Apr 12 10:50:42 x2 kernel: [507977.110830] [<ffffffffa0b83200>] ?
nilfs_readpage+0x0/0x20 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110837] [<ffffffff8114f58a>]
mpage_readpage+0x4a/0x60
Apr 12 10:50:42 x2 kernel: [507977.110853] [<ffffffffa0b83f40>] ?
nilfs_get_block+0x0/0x250 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110864] [<ffffffff810f1b65>] ?
__inc_zone_page_state+0x25/0x30
Apr 12 10:50:42 x2 kernel: [507977.110872] [<ffffffff810daa51>] ?
add_to_page_cache_locked+0xd1/0x130
Apr 12 10:50:42 x2 kernel: [507977.110888] [<ffffffffa0b83200>] ?
nilfs_readpage+0x0/0x20 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110894] [<ffffffff810daaf6>] ?
add_to_page_cache_lru+0x46/0x70
Apr 12 10:50:42 x2 kernel: [507977.110910] [<ffffffffa0b83213>]
nilfs_readpage+0x13/0x20 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110917] [<ffffffff810dab99>]
__read_cache_page+0x79/0xe0
Apr 12 10:50:42 x2 kernel: [507977.110932] [<ffffffffa0b83200>] ?
nilfs_readpage+0x0/0x20 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110940] [<ffffffff810dc071>]
read_cache_page_async+0x31/0xe0
Apr 12 10:50:42 x2 kernel: [507977.110946] [<ffffffff810dc12d>]
read_cache_page+0xd/0x60
Apr 12 10:50:42 x2 kernel: [507977.110967] [<ffffffffa0b9c786>] ?
nilfs_palloc_block_get_entry+0x26/0x60 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.110983] [<ffffffffa0b84a3f>]
nilfs_get_page+0x1f/0x80 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.111000] [<ffffffffa0b85388>]
nilfs_find_entry+0x78/0x220 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.111017] [<ffffffffa0b85544>]
nilfs_inode_by_name+0x14/0x30 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.111034] [<ffffffffa0b8761d>]
nilfs_lookup+0x3d/0x70 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.111043] [<ffffffff81128b0a>]
real_lookup+0xda/0x150
Apr 12 10:50:42 x2 kernel: [507977.111052] [<ffffffff8112a700>]
do_lookup+0xb0/0xe0
Apr 12 10:50:42 x2 kernel: [507977.111059] [<ffffffff8112b2ad>]
__link_path_walk+0x6ad/0xdc0
Apr 12 10:50:42 x2 kernel: [507977.111067] [<ffffffff8112bba7>]
path_walk+0x57/0xb0
Apr 12 10:50:42 x2 kernel: [507977.111075] [<ffffffff8112bc53>]
do_path_lookup+0x53/0xa0
Apr 12 10:50:42 x2 kernel: [507977.111083] [<ffffffff8112c8f2>]
user_path_at+0x52/0xa0
Apr 12 10:50:42 x2 kernel: [507977.111091] [<ffffffff8152cf59>] ?
_spin_lock+0x9/0x10
Apr 12 10:50:42 x2 kernel: [507977.111100] [<ffffffff81276e35>] ?
_atomic_dec_and_lock+0x55/0x80
Apr 12 10:50:42 x2 kernel: [507977.111108] [<ffffffff811238bf>] ?
cp_new_stat+0xdf/0xf0
Apr 12 10:50:42 x2 kernel: [507977.111114] [<ffffffff81123ac7>]
vfs_fstatat+0x37/0x70
Apr 12 10:50:42 x2 kernel: [507977.111121] [<ffffffff81123b69>]
vfs_lstat+0x19/0x20
Apr 12 10:50:42 x2 kernel: [507977.111127] [<ffffffff81123b8f>]
sys_newlstat+0x1f/0x50
Apr 12 10:50:42 x2 kernel: [507977.111138] [<ffffffff81012042>]
system_call_fastpath+0x16/0x1b
Apr 12 10:50:42 x2 kernel: [507977.111142] Code: 40 00 48 8b 4b 50 be
01 00 00 00 48 8d 53 10 8b 89 a8 00 00 00 d3 e6 48 63 f6 48 83 ee 10
48 c1 fe 04 48 c1 e6 03 e9 02 fe ff ff <0f> 0b eb fe 66 66 2e 0f 1f 84
00 00 00 00 00 55 48 89 e5 48 83
Apr 12 10:50:42 x2 kernel: [507977.111204] RIP [<ffffffffa0b8d6a1>]
nilfs_btree_do_lookup+0x291/0x2a0 [nilfs2]
Apr 12 10:50:42 x2 kernel: [507977.111223] RSP <ffff88008900f748>
Apr 12 10:50:42 x2 kernel: [507977.111230] ---[ end trace c49a62030f4c2c2c ]---
Since I could not determine how to build the latest version of the
kernel module for the 2.6.31 kernel, I rebooted on a 64-bit Ubuntu
Hardy system with a 2.6.24 kernel and installed the latest NILFS2
kernel module and utilities from git repository. When I accessed the
DVD-RAM with rsync to see which files were in error, I got the
following messages in syslog:
Apr 12 17:15:37 x2 kernel: [ 829.444875] Pid: 18374, comm: rsync Not
tainted 2.6.24-27-generic #1
Apr 12 17:15:37 x2 kernel: [ 829.444879]
Apr 12 17:15:37 x2 kernel: [ 829.444880] Call Trace:
Apr 12 17:15:37 x2 kernel: [ 829.444913] [<ffffffff8844218f>]
:nilfs2:nilfs_btree_do_lookup+0x28f/0x2b0
Apr 12 17:15:37 x2 kernel: [ 829.444935] [<ffffffff88443a57>]
:nilfs2:nilfs_btree_lookup_contig+0x67/0x330
Apr 12 17:15:37 x2 kernel: [ 829.444946] [__down_read+0x12/0xb1]
__down_read+0x12/0xb1
Apr 12 17:15:37 x2 kernel: [ 829.444961] [<ffffffff884417f7>]
:nilfs2:nilfs_bmap_lookup_contig+0x47/0x70
Apr 12 17:15:37 x2 kernel: [ 829.444974] [<ffffffff8843b354>]
:nilfs2:nilfs_get_block+0xb4/0x250
Apr 12 17:15:37 x2 kernel: [ 829.444990]
[do_mpage_readpage+0x40b/0x540] do_mpage_readpage+0x40b/0x540
Apr 12 17:15:37 x2 kernel: [ 829.445000] [<ffffffff8843b2a0>]
:nilfs2:nilfs_get_block+0x0/0x250
Apr 12 17:15:37 x2 kernel: [ 829.445026] [<ffffffff8843a1c0>]
:nilfs2:nilfs_readpage+0x0/0x10
Apr 12 17:15:37 x2 kernel: [ 829.445029]
[ext3:mpage_readpage+0x4e/0x150] mpage_readpage+0x4e/0x70
Apr 12 17:15:37 x2 kernel: [ 829.445037] [<ffffffff8843b2a0>]
:nilfs2:nilfs_get_block+0x0/0x250
Apr 12 17:15:37 x2 kernel: [ 829.445041] [__alloc_pages+0xaa/0x3f0]
__alloc_pages+0xaa/0x3f0
Apr 12 17:15:37 x2 kernel: [ 829.445054] [<ffffffff8843a1c0>]
:nilfs2:nilfs_readpage+0x0/0x10
Apr 12 17:15:37 x2 kernel: [ 829.445057]
[add_to_page_cache+0xb9/0xd0] add_to_page_cache+0xb9/0xd0
Apr 12 17:15:37 x2 kernel: [ 829.445063]
[add_to_page_cache_lru+0x24/0x40] add_to_page_cache_lru+0x24/0x40
Apr 12 17:15:37 x2 kernel: [ 829.445068]
[read_cache_page_async+0x96/0x150] read_cache_page_async+0x96/0x150
Apr 12 17:15:37 x2 kernel: [ 829.445077] [read_cache_page+0x6/0x40]
read_cache_page+0x6/0x40
Apr 12 17:15:37 x2 kernel: [ 829.445086] [<ffffffff8843b7b8>]
:nilfs2:nilfs_get_page+0x38/0x270
Apr 12 17:15:37 x2 kernel: [ 829.445091] [__slab_alloc+0x1bd/0x410]
__slab_alloc+0x1bd/0x410
Apr 12 17:15:37 x2 kernel: [ 829.445100] [<ffffffff8843cd72>]
:nilfs2:nilfs_alloc_inode_common+0x22/0x70
Apr 12 17:15:37 x2 kernel: [ 829.445115] [<ffffffff8843bd64>]
:nilfs2:nilfs_find_entry+0x84/0x1c0
Apr 12 17:15:37 x2 kernel: [ 829.445134] [<ffffffff8843beb1>]
:nilfs2:nilfs_inode_by_name+0x11/0x30
Apr 12 17:15:37 x2 kernel: [ 829.445146] [<ffffffff8843ec55>]
:nilfs2:nilfs_lookup+0x45/0xc0
Apr 12 17:15:37 x2 kernel: [ 829.445153] [do_lookup+0x1b6/0x250]
do_lookup+0x1b6/0x250
Apr 12 17:15:37 x2 kernel: [ 829.445165]
[__link_path_walk+0x74c/0xe90] __link_path_walk+0x74c/0xe90
Apr 12 17:15:37 x2 kernel: [ 829.445180] [link_path_walk+0x5b/0x100]
link_path_walk+0x5b/0x100
Apr 12 17:15:37 x2 kernel: [ 829.445202] [do_path_lookup+0x8a/0x250]
do_path_lookup+0x8a/0x250
Apr 12 17:15:37 x2 kernel: [ 829.445206] [getname+0x1a9/0x220]
getname+0x1a9/0x220
Apr 12 17:15:37 x2 kernel: [ 829.445213] [__user_walk_fd+0x4b/0x80]
__user_walk_fd+0x4b/0x80
Apr 12 17:15:37 x2 kernel: [ 829.445220] [vfs_lstat_fd+0x2c/0x70]
vfs_lstat_fd+0x2c/0x70
Apr 12 17:15:37 x2 kernel: [ 829.445239] [sys_newlstat+0x27/0x50]
sys_newlstat+0x27/0x50
Apr 12 17:15:37 x2 kernel: [ 829.445256] [system_call+0x7e/0x83]
system_call+0x7e/0x83
Apr 12 17:15:37 x2 kernel: [ 829.445270]
Apr 12 17:15:37 x2 kernel: [ 829.445271] NILFS: btree level mismatch: 61 != 1
The utility lssu reported the following segment structure:
SEGNUM DATE TIME STAT NBLOCKS
0 ---------- --:--:-- --- 0
...
20 ---------- --:--:-- --- 0
21 2010-04-11 23:23:21 ad- 268
22 ---------- --:--:-- ad- 0
23 ---------- --:--:-- --- 0
...
191 ---------- --:--:-- --- 0
192 2010-04-11 17:18:46 -d- 2048
193 2010-04-11 17:18:47 -d- 2048
194 2010-04-11 17:18:58 -d- 2048
...
533 2010-04-11 20:42:02 -d- 2048
534 2010-04-11 20:42:54 -d- 2048
535 2010-04-11 22:13:18 -d- 2047
536 ---------- --:--:-- --- 0
537 ---------- --:--:-- --- 0
538 ---------- --:--:-- --- 0
539 ---------- --:--:-- --- 0
540 ---------- --:--:-- --- 0
541 ---------- --:--:-- --- 0
542 ---------- --:--:-- --- 0
543 ---------- --:--:-- --- 0
544 ---------- --:--:-- --- 0
545 ---------- --:--:-- --- 0
The checkpoints on the DVD-RAM included 4 snapshots, each of which was
associated with a distinct backup that I had copied to build the
DVD-RAM. I compared those snapshots with the sample backups from
which they were constructed and found that a single directory was
missing from each of the snapshots. That directory was an old one
that was unmodified during the period from which I selected the sample
backups. I assume that the cleaner was copying that directory when
the write errors began and that a shared copy of that directory was
lost in all of snapshots.
In summary, it appears that the write errors were isolated to a single
directory. The errors resulted in a file system that was unusable
with the NILFS2 kernel module in 2.6.31 but was accessible with the
latest version of the kernel module when installed in a older kernel
(2.6.24). (Is there some way to install a version of the NILFS2
kernel module from the NILFS2 git repository on a kernel newer than
2.6.29?)
I have not yet reformatted the DVD-RAM. I will perform additional
experiments with it if they would be helpful.
Ed
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html