On Wed, 15 Jun 2011 19:58:58 +0900 (JST), Ryusuke Konishi wrote:
> On Wed, 15 Jun 2011 10:42:51 +0900 (JST), Ryusuke Konishi wrote:
> > On Tue, 14 Jun 2011 11:04:26 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > > I changed the code some to:
> > > diff -u --ignore-all-space fsck0.nilfs2.c
> > > ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck
> > > --- fsck0.nilfs2.c 2011-06-14 11:03:49.000000000 -0700
> > > +++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c
> > > 2011-06-14 11:01:34.000000000 -0700
> > > @@ -172,10 +172,14 @@
> > > static void read_block(int fd, __u64 blocknr, void *buf,
> > > unsigned long size)
> > > {
> > > + int num_read;
> > > if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
> > > - read(fd, buf, size) < size)
> > > - die("cannot read block (blocknr = %llu): %s",
> > > - (unsigned long long)blocknr, strerror(errno));
> > > + (num_read = read(fd, buf, size) < size)) {
> > > + fprintf(stderr, "Read size was: %d\tNum read:
> > > %d\tStrerror: %s\n",
> > > + size, num_read, strerror(errno));
> > > + die("cannot read block (blocknr = %llu)",
> > > + (unsigned long long)blocknr);
> > > + }
> > > }
> > >
> > > static inline __u64 segment_start_blocknr(unsigned long segnum)
> > >
> > > and I got this as output:
> > >
> > > ./fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > > revision = 2.0
> > > blocksize = 4096
> > > write time = 2011-06-11 23:22:03
> > > indicated log: blocknr = 1648528
> > > segnum = 804, seq = 401758, cno=3250953
> > >
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > Read size was: 4096 Num read: 1 Strerror: Success
> > > fsck0.nilfs2: cannot read block (blocknr = 2696911)
>
> Ah, sorry. I noticed that the block number (= 2696911) is beyond the
> size of your block device. It is the cause of this error.
>
> I'll look into the rollback loop code of fsck0.nilfs2 to find out the
> root cause of this out-of-range access.
Uum, this bug is not trivial.
Clearly this happened in the context of
find_latest_cno_in_logical_segment() function, but I couldn't find any
suspicious callsites so far.
If you hurry, please go ahead.
Otherwise (if the data on the partition is important), I need your
help to narrow down this problem. If we can get a backtrace of the
error, things would become clear.
Anyway, I would like to release an updated nilfs2 kmod in a week or so
for centos users to minimize this sort of thing.
Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html