I have an interesting problem. I've made no changes to the IB DDN storage yet I'm finding OST's crashing left and right. The thread watchdog gets triggered and the most relevant part of the dump is the following. It appears that it took > 100s to find a free extent. On the OSS I watch with iostat as the lun is saturated with small read requests.

We've just hit 80% full (we planned on going to 90% full) and we do have a lot of small files (~75 million )

Is there anyway to tune the extent searching code? Does my analysis seem likely? Is this fixed in 1.6.1 such that I should upgrade immediately?

Thanks,
Daniel

Call Trace:<ffffffffa0024125>{:sd_mod:sd_iostats_bump+147} <ffffffffa031429a>{:ib_srp:srp_host_qcommand+399} <ffffffff80253ebf>{deadline_next_request+34} <ffffffff8024b329>{elv_next_request+238} <ffffffff80309843>{io_schedule+38} <ffffffff8017843c>{__wait_on_buffer+125} <ffffffff801782c2>{bh_wake_function+0} <ffffffff801782c2>{bh_wake_function+0}
      <ffffffffa05771d9>{:ldiskfs:ldiskfs_mb_init_cache+469}
<ffffffff80157ba2>{add_to_page_cache+167} <ffffffffa0577792>{:ldiskfs:ldiskfs_mb_load_buddy+257}
      <ffffffffa057a89f>{:ldiskfs:ldiskfs_mb_new_blocks+1946}
      <ffffffffa05b480e>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+729}
      <ffffffffa0574362>{:ldiskfs:ldiskfs_ext_find_extent+205}
      <ffffffffa0575a69>{:ldiskfs:ldiskfs_ext_walk_space+535}
      <ffffffffa05b4535>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0}
      <ffffffffa05b4b56>{:fsfilt_ldiskfs:fsfilt_map_nblocks+236}
<ffffffffa05b4d50>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+457}
      <ffffffffa05d59dc>{:obdfilter:filter_direct_io+892}
      <ffffffffa05b36f2>{:fsfilt_ldiskfs:fsfilt_ldiskfs_brw_start+649}
      <ffffffffa05d6fb9>{:obdfilter:filter_commitrw_write+3494}
<ffffffff80308ecd>{thread_return+0} <ffffffff80308f25>{thread_return+88} <ffffffffa0378bf2>{:lnet:lnet_send+2251} <ffffffffa05d100e>{:obdfilter:filter_commitrw+84} <ffffffff8013f23f>{del_timer+107} <ffffffff8013f2fc>{del_singleshot_timer_sync+9} <ffffffff803099f7>{schedule_timeout+375} <ffffffffa05a2c47>{:ost:ost_brw_write+5119} <ffffffff801331a5>{default_wake_function+0} <ffffffffa059f513>{:ost:ost_bulk_timeout+0}
      <ffffffffa043269f>{:ptlrpc:lustre_msg_get_version+64}
<ffffffffa05a637e>{:ost:ost_handle+6987} <ffffffffa034cf41>{:libcfs:libcfs_debug_vmsg2+1713}
      <ffffffff801e9c83>{vsnprintf+1406} <ffffffff801e9d66>{snprintf+131}
      <ffffffffa043906b>{:ptlrpc:ptlrpc_server_handle_request+2336}
<ffffffff8013f100>{__mod_timer+293} <ffffffffa043ad29>{:ptlrpc:ptlrpc_main+2018} <ffffffff801331a5>{default_wake_function+0} <ffffffffa0439a47>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0439a47>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff80110e23>{child_rip+8} <ffffffffa043a547>{:ptlrpc:ptlrpc_main+0} <ffffffff80110e1b>{child_rip+0}



--
Daniel Leaberry
Systems Administrator
iArchives
Tel: 801-494-6528
Cell: 801-376-6411

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to