On Mon, 22 Sep 2014, Anton Altaparmakov wrote:
> Hi Hugh,
>
> On 22 Sep 2014, at 05:43, Hugh Dickins <[email protected]> wrote:
> > On Mon, 22 Sep 2014, Anton Altaparmakov wrote:
> >> Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(),
> >> sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a
> >> 32-bit arch (where "long" is 32-bit) causes an inifinite loop in
> >> __getblk_slow() with an infinite stream of errors logged to dmesg like
> >> this:
> >>
> >> __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
> >> b_state=0x00000020, b_size=512
> >> device sda1 blocksize: 512
> >>
> >> Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
> >> top 32-bits are missing (in this case the 0x1 at the top).
> >>
> >> This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block:
> >> replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that
> >> it now has a 32-bit overflow due to shifting the block value to the right
> >> so it fits in 32-bits and storing the result in pgoff_t variable which is
> >> 32-bit and then passing the pgoff_t variable left-shifted as the block
> >> number which causes the top bits to get lost as the pgoff_t is not type
> >> cast to sector_t / 64-bit before the shift.
> >>
> >> This patch fixes this issue by type casting "index" to sector_t before
> >> doing the left shift.
> >>
> >> Note this is not a theoretical bug but has been seen in the field on a
> >> 4TiB hard drive with logical sector size 512 bytes.
> >>
> >> This patch has been verified to fix the infinite loop problem on 3.17-rc5
> >> kernel using a 4TB disk image mounted using "-o loop". Without this patch
> >> doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
> >> 100% reproducibly whilst with the patch it works fine as expected.
> >>
> >> Signed-off-by: Anton Altaparmakov <[email protected]>
> >> Cc: [email protected] # 3.6 3.8 3.10 3.12 3.14 3.16
> >
> > Acked-by: Hugh Dickins <[email protected]>
> >
> > Yes indeed, that's bad, and entirely my fault (though it took me a while
> > to see how the "block = index << sizebits" which was there before was okay,
>
> Ouch. I failed to see that (I admit I did not pay too much attention to the
> original code - I just used "git blame" to find out which commit put that
> code in place - my bad). It was never ok! I went back to 2.6.12-rc2
> (original commit to git Linus' kernel repository) and that is also affected.
> That implies this is the first time anyone has used a 4TiB disk with 512 byte
> sectors with NTFS where Windows had previously created a file/directory with
> an attribute list attribute in it. - That is the only metadata type we use
> sb_bread() for and the disk image from the customer does contain it and I
> verified that is where the inifinite loop comes from.
Ow, that line needs some truncating itself :)
You generously interpret my words as seeing that for myself.
No, thank you for following up: I had persuaded myself when writing
before, that index would be promoted from pgoff_t to sector_t before
shifting in the original assignment, but not when I passed it as arg.
I've resorted to writing a proggy to check, and it looks like I was
earlier confused, and you are right, and that code was "always" wrong.
Not much use of 4TiB disks on 32-bit kernels I suppose.
>
> So it appears sb_bread() and friends have always been broken on 32-bit
> architectures when accessing blocks outside the range 2^32 - 1 and it appears
> google finds lots of occurrences of such infinite loops being reported but
> the fixes have always been to not make the calls in the first place. No-one
> seems to have recognized that there is a genuine problem here before.
>
> Still my patch stands correct and should be applied to all kernel versions
> that have your patch and older kernels should have an equivalent patch
> applied to fix them, too for people who run very old kernels.
Yes, though I'm now uncertain whether your patch is a bugfix or an
enhancement. Definitely a bugfix, given the infinite loops. But the
oldest code ("index = block >> sizebits; block = index << sizebits;")
is so clearly trying to truncate block, that I wonder what departing
from that might be letting us in for.
I see Andrew did 2.6.19's intervening e5657933863f ("grow_buffers()
infinite loop fix"): let's hope that he has a clearer head in the
morning than I have now - there's a chance that he might think it
safer to extend that check to exclude your case, than take your patch.
I hope not, that would not please you; but right now I'm ruling
myself out of grasping the issues here!
Hugh
>
> > but my passing "index << sizebits" as arg to function very much not okay).
> > Thank you, Anton.
>
> You are welcome.
>
> > But I see my commit was marked for stable 3.0 3.2 3.4 3.5: so your fix
> > is needed in 3.2 and 3.4 longterm also (the others now beyond life).
>
> You are quite right. It needs to go back to all those kernels, too. Thank
> you!
>
> Best regards,
>
> Anton
>
> > Hugh
> >
> >> ---
> >>
> >> Linus, can you please apply this? Alternatively, Andrew, can you please
> >> pick this up and send it to Linus?
> >>
> >> It would be good it it can be applied for 3.17 as well as to all stable
> >> kernels >= 3.6 as they are all broken!
> >>
> >> Thanks a lot in advance!
> >>
> >> Best regards,
> >>
> >> Anton
> >> --
> >> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
> >> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
> >> Linux NTFS maintainer, http://www.linux-ntfs.org/
> >>
> >> diff --git a/fs/buffer.c b/fs/buffer.c
> >> index 8f05111..3588a80 100644
> >> --- a/fs/buffer.c
> >> +++ b/fs/buffer.c
> >> @@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t
> >> block,
> >> bh = page_buffers(page);
> >> if (bh->b_size == size) {
> >> end_block = init_page_buffers(page, bdev,
> >> - index << sizebits, size);
> >> + (sector_t)index << sizebits,
> >> + size);
> >> goto done;
> >> }
> >> if (!try_to_free_buffers(page))
> >> @@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t
> >> block,
> >> */
> >> spin_lock(&inode->i_mapping->private_lock);
> >> link_dev_buffers(page, bh);
> >> - end_block = init_page_buffers(page, bdev, index << sizebits, size);
> >> + end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
> >> + size);
> >> spin_unlock(&inode->i_mapping->private_lock);
> >> done:
> >> ret = (block < end_block) ? 1 : -ENXIO;
>
> --
> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
> University of Cambridge Information Services, Roger Needham Building
> 7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/