Pawel Jakub Dawidek,

        Before I (if I) spend some reviewing this item,
        the in my experience the slab allocator on
        low memory does:

        IMO, in my experience.. your mileage may vary.
        Hopefully this is review. For the less knowledgeable
        people the underlying Memory allocator is known
        as the Slab Allocator by Jeff Bonwick and should
        be able to be found if "googled".

        First, just because you call sleep, doesn't mean
        that you will sleep. It just means that when you
        return you are ?guranteed? to have memory. And that
        you are not calling it from interrupt context.

        The _alloc call is the front end of the allocator.

        First, running "low on memory" could simply mean
        that most/alot of available memory has been cached. The
        question is if any of the freed objects have
        references to them. Most of the time, they have
        been freed from the front end and will be re-allocated
        on demand / when necessary to remove the overhead of
        its resuse to the same slab.

        Checks within the slab for any cached objects and
        allocates from the cached objects. If the slab
        is empty, then memory is attempted to be retrieved
        from a freelist. If the freelist is empty then
        the backend attempts to reclaim memory.

        Now I will give someone WITHIN Sun time to review 
        your trace and respond.. If they don't, I will
        consider a further review of this..

        Mitchell Erblich
        ---------------

Pawel Jakub Dawidek wrote:
> 
> Hi.
> 
> Kris Kennaway <kris at FreeBSD.org> found a deadlock, which I think is not
> FreeBSD-specific.
> 
> When we are running low in memory and kmem_alloc(KM_SLEEP) is called,
> the thread waits for the memory to be reclaimed, right?
> In such situation the arc_reclaim_thread thread is woken up.
> 
> Ok. I've two threads waiting for the memory to be freed:
> 
> First one, and this one is not really problematic:
> 
> arc_lowmem(0,0,a054c56b,12c,a0a74088,...) at arc_lowmem+0x74
> kmem_malloc(a0a71090,20000,2,e6768840,a047e7e5,...) at kmem_malloc+0x131
> page_alloc(0,20000,e6768833,2,aedd9d20,...) at page_alloc+0x27
> uma_large_malloc(20000,2,0,0,0,...) at uma_large_malloc+0x55
> malloc(20000,aedd6080,2,e6768888,aedb7159,...) at malloc+0x120
> zfs_kmem_alloc(20000,2,e67688b8,aed791db,20000,...) at zfs_kmem_alloc+0x13
> zio_data_buf_alloc(20000,aedd9cc0,20000,1,20000,...) at zio_data_buf_alloc+0xd
> arc_get_data_buf(ae166dc0,2,ca220690,b1f37450,e6768928,...) at 
> arc_get_data_buf+0x23f
> arc_buf_alloc(ae18d000,20000,ca220690,1,0,...) at arc_buf_alloc+0x9a
> dbuf_read(ca220690,b1f37450,2,bf0933a0,1a7600,...) at dbuf_read+0xf4
> dmu_tx_check_ioerr(0,d,0,a0a74880,0,...) at dmu_tx_check_ioerr+0x6c
> dmu_tx_count_write(197600,0,10000,0,197600,...) at dmu_tx_count_write+0x3ce
> dmu_tx_hold_write(bad1f800,5d52,0,197600,0,...) at dmu_tx_hold_write+0x50
> zfs_freebsd_write(e6768b90,a055a4d5,0,0,0,...) at zfs_freebsd_write+0x1cf
> VOP_WRITE_APV(aedd8540,e6768b90,b608b1d0,a053f500,241,...) at 
> VOP_WRITE_APV+0x17c
> vn_write(ae95e630,e6768c58,c1a72680,0,b608b1d0,...) at vn_write+0x250
> dofilewrite(b608b1d0,4,ae95e630,e6768c58,ffffffff,...) at dofilewrite+0x9e
> kern_writev(b608b1d0,4,e6768c58,805f000,10000,...) at kern_writev+0x60
> write(b608b1d0,e6768d00,c,e6768c94,a034f435,...) at write+0x4f
> syscall(e6768d38) at syscall+0x2f3
> 
> And second one, which holds arc_buf_t->b_hdr->hash_lock:
> 
> arc_lowmem(0,0,a054c56b,12c,a0a74088,...) at arc_lowmem+0x1c
> kmem_malloc(a0a71090,20000,2,e69888b8,a047e7e5,...) at kmem_malloc+0x131
> page_alloc(0,20000,e69888ab,2,aedd9da0,...) at page_alloc+0x27
> uma_large_malloc(20000,2,0,0,0,...) at uma_large_malloc+0x55
> malloc(20000,aedd6080,2,e6988900,aedb7159,...) at malloc+0x120
> zfs_kmem_alloc(20000,2,e6988930,aed791db,20000,...) at zfs_kmem_alloc+0x13
> zio_data_buf_alloc(20000,aedd9cc0,20000,1,20000,...) at zio_data_buf_alloc+0xd
> arc_get_data_buf(ae166dc0,2,20000,0,b8644cf8,...) at arc_get_data_buf+0x23f
> arc_read(c29ec228,ae18d000,af885080,aed80b6c,aed7d168,...) at arc_read+0x33d
> dbuf_read(baf49460,c29ec228,12,c5f8a528,c6254cb0,...) at dbuf_read+0x463
> dmu_buf_hold_array_by_dnode(20000,0,400,0,1,...) at 
> dmu_buf_hold_array_by_dnode+0x1b0
> dmu_buf_hold_array(58ef,0,20000,0,400,...) at dmu_buf_hold_array+0x4c
> dmu_read_uio(c23da3c0,58ef,0,e6988c58,400,...) at dmu_read_uio+0x35
> zfs_freebsd_read(e6988b90,a055a48c,adbbd6c0,adbbd6c0,adbbd6c0,...) at 
> zfs_freebsd_read+0x3d8
> VOP_READ_APV(aedd8540,e6988b90,b6f4eae0,a053f500,202,...) at VOP_READ_APV+0xd2
> vn_read(adbbd6c0,e6988c58,b0c58e00,0,b6f4eae0,...) at vn_read+0x297
> dofileread(b6f4eae0,3,adbbd6c0,e6988c58,ffffffff,...) at dofileread+0xa7
> kern_readv(b6f4eae0,3,e6988c58,9f7fc670,400,...) at kern_readv+0x60
> read(b6f4eae0,e6988d00,c,e6988c94,a034f435,...) at read+0x4f
> syscall(e6988d38) at syscall+0x2f3
> 
> The arc_reclaim_thread thread deadlocks here:
> 
> sched_switch(ae8f9910,0,1,17a,a05b6e14,...) at sched_switch+0x16c
> mi_switch(1,0,a0537906,1cc,aede1b8c,...) at mi_switch+0x306
> sleepq_switch(aede1b8c,0,a0537906,21e,e611abe0,...) at sleepq_switch+0x113
> sleepq_wait(aede1b8c,0,aedcef22,3,0,...) at sleepq_wait+0x65
> _sx_xlock_hard(aede1b8c,ae8f9910,aedcecb7,447,e611ac1c,...) at 
> _sx_xlock_hard+0x17e
> _sx_xlock(aede1b8c,aedcecb7,447,0,0,...) at _sx_xlock+0x69
> arc_buf_remove_ref(b4288ca8,cc943c94,519,cc943cd0,cc943dac,...) at 
> arc_buf_remove_ref+0x58
> dbuf_rele(cc943c94,cc943dac,cc95d348,cc943dac,35,...) at dbuf_rele+0x195
> dbuf_clear(cc943dac,cc943dac,e611ac80,aed7cea0,cc943dac,...) at 
> dbuf_clear+0x7f
> dbuf_evict(cc943dac,cc99a3d4,e611ac90,aed788b5,cc99a3d4,...) at dbuf_evict+0xd
> dbuf_do_evict(cc99a3d4,4879,e611acfc,aed78f0f,64,...) at dbuf_do_evict+0x44
> arc_do_user_evicts(64,0,246,a056b498,1,...) at arc_do_user_evicts+0x51
> arc_reclaim_thread(0,e611ad38,a0531203,326,ae8f76c0,...) at 
> arc_reclaim_thread+0x36b
> fork_exit(aed78ba4,0,e611ad38) at fork_exit+0xd1
> fork_trampoline() at fork_trampoline+0x8
> 
> Let me convert the offsets to file:line as found in OpenSolaris code:
> 
> arc.c:1088              arc_buf_remove_ref+0x58
> dbuf.c:1710/1713        dbuf_rele+0x195
> dbuf.c:1308             dbuf_clear+0x7f
> dbuf.c:233              dbuf_evict+0xd
> dbuf.c:1453             dbuf_do_evict+0x44
> arc.c:1314              arc_do_user_evicts+0x51
> arc.c:1537              arc_reclaim_thread+0x36b
> 
> (In this dbuf.c:1710/1713 I'm not sure which line exactly it is, but it
> doesn't matter from what I see.)
> 
> The most important part is dbuf.c:1308, which calls dbuf_rele() on the
> parent dmu_buf_impl_t, which is already locked by the second thread, so
> when we lock it in arc_buf_remove_ref() we deadlock, because the lock
> is held by the thread waiting for memory.
> 
> Does my description make sense? Do you have any suggestions how to fix it?
> 
> --
> Pawel Jakub Dawidek                       http://www.wheel.pl
> pjd at FreeBSD.org                           http://www.FreeBSD.org
> FreeBSD committer                         Am I Evil? Yes, I Am!
> 
>   ------------------------------------------------------------------------
>    Part 1.1.2Type: application/pgp-signature
> 
>    Part 1.2    Type: Plain Text (text/plain)
>            Encoding: 7bit

Reply via email to