This is the core file from the crash just now

[root@psanaoss213 /]# ls -al core*
-rw------- 1 root root 4073594880 Jun  8 15:05 core.22682

From yesterday:
[root@psanaoss214 /]# ls -al core*
-rw------- 1 root root 4362727424 Jun  8 00:58 core.13483
-rw------- 1 root root 4624773120 Jun  8 03:21 core.8792


On 06/08/2012 04:34 PM, Anand Avati wrote:
Is it possible the system was running low on memory? I see you have 48GB, but memory registration failure typically would be because the system limit on the number of pinnable pages in RAM was hit. Can you tell us the size of your core dump files after the crash?

Avati

On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <[email protected] <mailto:[email protected]>> wrote:

    Hello,

    I have a brick that crashed twice today, and another different
    brick that crashed just a while a go.

    This is what I see in one of the brick logs:

    patchset: git://git.gluster.com/glusterfs.git
    <http://git.gluster.com/glusterfs.git>
    patchset: git://git.gluster.com/glusterfs.git
    <http://git.gluster.com/glusterfs.git>
    signal received: 6
    signal received: 6
    time of crash: 2012-06-08 15:05:11
    configuration details:
    argp 1
    backtrace 1
    dlfcn 1
    fdatasync 1
    libpthread 1
    llistxattr 1
    setfsid 1
    spinlock 1
    epoll.h 1
    xattr.h 1
    st_atim.tv_nsec 1
    package-string: glusterfs 3.2.6
    /lib64/libc.so.6[0x34bc032900]
    /lib64/libc.so.6(gsignal+0x35)[0x34bc032885]
    /lib64/libc.so.6(abort+0x175)[0x34bc034065]
    /lib64/libc.so.6[0x34bc06f977]
    /lib64/libc.so.6[0x34bc075296]
    /opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4]
    
/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967]
    
/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32]
    
/opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135]
    /lib64/libpthread.so.0[0x34bc8077f1]
    /lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd]
    ---------

    And somewhere before these, there is also
    [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post]
    0-rpc-transport/rdma: memory registration failed

    I have 48GB of memory on the system:

    # free
total used free shared buffers cached Mem: 49416716 34496648 14920068 0 31692 28209612
    -/+ buffers/cache:    6255344   43161372
    Swap:      4194296 1740 4192556 <tel:1740%20%20%20%204192556>

    # uname -a
    Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10
    15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

    The server gluster versions is 3.2.6-1. I am using have both rdma
    clients and tcp clients over 10Gb/s network.

    Any suggestion what I should look for?

    Is there a way to just restart the brick, and not glusterd on the
    server? I have 8 bricks on the server.

    Thanks,
    ...
    ling


    Here's the volume info:

    # gluster volume info

    Volume Name: ana12
    Type: Distribute
    Status: Started
    Number of Bricks: 40
    Transport-type: tcp,rdma
    Bricks:
    Brick1: psanaoss214:/brick1
    Brick2: psanaoss214:/brick2
    Brick3: psanaoss214:/brick3
    Brick4: psanaoss214:/brick4
    Brick5: psanaoss214:/brick5
    Brick6: psanaoss214:/brick6
    Brick7: psanaoss214:/brick7
    Brick8: psanaoss214:/brick8
    Brick9: psanaoss211:/brick1
    Brick10: psanaoss211:/brick2
    Brick11: psanaoss211:/brick3
    Brick12: psanaoss211:/brick4
    Brick13: psanaoss211:/brick5
    Brick14: psanaoss211:/brick6
    Brick15: psanaoss211:/brick7
    Brick16: psanaoss211:/brick8
    Brick17: psanaoss212:/brick1
    Brick18: psanaoss212:/brick2
    Brick19: psanaoss212:/brick3
    Brick20: psanaoss212:/brick4
    Brick21: psanaoss212:/brick5
    Brick22: psanaoss212:/brick6
    Brick23: psanaoss212:/brick7
    Brick24: psanaoss212:/brick8
    Brick25: psanaoss213:/brick1
    Brick26: psanaoss213:/brick2
    Brick27: psanaoss213:/brick3
    Brick28: psanaoss213:/brick4
    Brick29: psanaoss213:/brick5
    Brick30: psanaoss213:/brick6
    Brick31: psanaoss213:/brick7
    Brick32: psanaoss213:/brick8
    Brick33: psanaoss215:/brick1
    Brick34: psanaoss215:/brick2
    Brick35: psanaoss215:/brick4
    Brick36: psanaoss215:/brick5
    Brick37: psanaoss215:/brick7
    Brick38: psanaoss215:/brick8
    Brick39: psanaoss215:/brick3
    Brick40: psanaoss215:/brick6
    Options Reconfigured:
    performance.io-thread-count: 16
    performance.write-behind-window-size: 16MB
    performance.cache-size: 1GB
    nfs.disable: on
    performance.cache-refresh-timeout: 1
    network.ping-timeout: 42
    performance.cache-max-file-size: 1PB

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to