Re: [PATCH] Btrfs: Check for NULL page in extent_range_uptodate

Mitch Harder Mon, 30 Jan 2012 15:16:06 -0800

On Mon, Jan 30, 2012 at 3:41 PM, Vincent Vanackere
<[email protected]> wrote:
> On Wed, Jan 25, 2012 at 20:03, Mitch Harder
> <[email protected]> wrote:
>> A user has encountered a NULL pointer kernel oops in btrfs when
>> encountering media errors.  The problem has been identified
>> as an unhandled NULL pointer returned from find_get_page().
>> This modification simply checks for a NULL page, and returns
>> with an error if found (the extent_range_uptodate() function
>> returns 1 on errors).
>>
>> After testing this patch, the user reported that the error with
>> the NULL pointer oops was solved.  However, there is still a
>> remaining problem with a thread becoming stuck in
>> wait_on_page_locked(page) in the read_extent_buffer_pages(...)
>> function in extent_io.c
>>
>>       for (i = start_i; i < num_pages; i++) {
>>               page = extent_buffer_page(eb, i);
>>               wait_on_page_locked(page);
>>               if (!PageUptodate(page))
>>                       ret = -EIO;
>>       }
>>
>> This patch leaves the issue with the locked page yet to be resolved.
>>
>> Signed-off-by: Mitch Harder <[email protected]>
>> ---
>>  fs/btrfs/extent_io.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 9d09a4f..fcf77e1 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3909,6 +3909,8 @@ int extent_range_uptodate(struct extent_io_tree *tree,
>>        while (start <= end) {
>>                index = start >> PAGE_CACHE_SHIFT;
>>                page = find_get_page(tree->mapping, index);
>> +               if (!page)
>> +                       return 1;
>>                uptodate = PageUptodate(page);
>>                page_cache_release(page);
>>                if (!uptodate) {
>> --
>> 1.7.3.4
>>
>
>
> Hi,
>
>  If any btrfs developer could have a look at it while I can still
> reproduce the situation (it won't last long, I'll send the disk to RMA
> next week), I'm still interested in solving the remaining part of the
> btrfs bug. Here is the trace I get with the current linux kernel
> (6bc2b95ee602659c1be6fac0f6aadeb0c5c29a5d) :
>
> [  330.530015] btrfs bad tree block start 959241011200 959241011200
> [  480.288046] INFO: task cat:2627 blocked for more than 120 seconds.
> [  480.288050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  480.288052] cat             D ffffffff8180c600     0  2627   2468 
> 0x00000004
> [  480.288057]  ffff8801fe135618 0000000000000086 ffff8801fe1355d8
> ffff880222061650
> [  480.288062]  ffff880215b5db80 ffff8801fe135fd8 ffff8801fe135fd8
> ffff8801fe135fd8
> [  480.288067]  ffff8802241a16e0 ffff880215b5db80 ffff8801fe1355e8
> ffff88022fd93e88
> [  480.288071] Call Trace:
> [  480.288080]  [<ffffffff81114440>] ? __lock_page+0x70/0x70
> [  480.288084]  [<ffffffff8162c0af>] schedule+0x3f/0x60
> [  480.288087]  [<ffffffff8162c15f>] io_schedule+0x8f/0xd0
> [  480.288091]  [<ffffffff8111444e>] sleep_on_page+0xe/0x20
> [  480.288094]  [<ffffffff8162a96f>] __wait_on_bit+0x5f/0x90
> [  480.288098]  [<ffffffff811145b8>] wait_on_page_bit+0x78/0x80
> [  480.288102]  [<ffffffff81070c70>] ? autoremove_wake_function+0x40/0x40
> [  480.288129]  [<ffffffffa005d161>]
> read_extent_buffer_pages+0x471/0x4d0 [btrfs]
> [  480.288142]  [<ffffffffa00347b0>] ? verify_parent_transid+0x160/0x160 
> [btrfs]
> [  480.288155]  [<ffffffffa003513a>]
> btree_read_extent_buffer_pages.isra.99+0x8a/0xc0 [btrfs]
> [  480.288169]  [<ffffffffa00371e1>] read_tree_block+0x41/0x60 [btrfs]
> [  480.288179]  [<ffffffffa001d6a3>]
> read_block_for_search.isra.34+0xf3/0x3d0 [btrfs]
> [  480.288190]  [<ffffffffa001f930>] btrfs_search_slot+0x300/0x8a0 [btrfs]
> [  480.288203]  [<ffffffffa0031ab4>] btrfs_lookup_csum+0x74/0x170 [btrfs]
> [  480.288216]  [<ffffffffa0031d5f>] __btrfs_lookup_bio_sums+0x1af/0x3b0 
> [btrfs]
> [  480.288228]  [<ffffffffa0031fb6>] btrfs_lookup_bio_sums+0x16/0x20 [btrfs]
> [  480.288242]  [<ffffffffa003e650>] btrfs_submit_bio_hook+0x140/0x170 [btrfs]
> [  480.288256]  [<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0x720 [btrfs]
> [  480.288272]  [<ffffffffa00571aa>] submit_one_bio+0x6a/0xa0 [btrfs]
> [  480.288287]  [<ffffffffa005be64>] extent_readpages+0xe4/0x100 [btrfs]
> [  480.288301]  [<ffffffffa00405d0>] ? btrfs_real_readdir+0x720/0x720 [btrfs]
> [  480.288315]  [<ffffffffa003eebf>] btrfs_readpages+0x1f/0x30 [btrfs]
> [  480.288319]  [<ffffffff81120bef>] __do_page_cache_readahead+0x1af/0x250
> [  480.288323]  [<ffffffff81120ff1>] ra_submit+0x21/0x30
> [  480.288326]  [<ffffffff81121115>] ondemand_readahead+0x115/0x230
> [  480.288330]  [<ffffffff81137eb9>] ? __do_fault+0x419/0x530
> [  480.288333]  [<ffffffff81121311>] page_cache_sync_readahead+0x31/0x50
> [  480.288337]  [<ffffffff811167d8>] generic_file_aio_read+0x438/0x780
> [  480.288342]  [<ffffffff81173db2>] do_sync_read+0xd2/0x110
> [  480.288346]  [<ffffffff81294113>] ? security_file_permission+0x93/0xb0
> [  480.288349]  [<ffffffff81174231>] ? rw_verify_area+0x61/0xf0
> [  480.288352]  [<ffffffff81174710>] vfs_read+0xb0/0x180
> [  480.288355]  [<ffffffff8117482a>] sys_read+0x4a/0x90
> [  480.288359]  [<ffffffff81635229>] system_call_fastpath+0x16/0x1b


Jeff Mahoney has been working on a large overhaul of error
handling/BUG_ONs.  It is difficult to say when it  will be ready, or
if it will even address this specific problem.

I'd go ahead and return the disk.  I doubt you'll be the last user to
have bad sectors, so there'll be more opportunities to see how this
issue is handled after the changes to error handling.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: Check for NULL page in extent_range_uptodate

Reply via email to