Hi, guys again. Looking at this issue, I suspect this is bug in btrfs.
We'll have to clean up this installation soon, so if there is any
request to do some debugging, please, ask. I'll try to reiterate what
was said in this thread.

Short story: btrfs filesystem made of 22 1Tb disks with lot's of files
(~30240000). Write load is 25 Mbyte/second. After some time file system
became unable to cope with this load. Also at this time `sync` takes
ages to finish, shutdown -r hangs (I guess related to sync).

Also I see there is one some kernel kworker that is main suspect for
this behavior: all the time it takes 100% of CPU core, jumping from core
to core. At the same time according to iostat write/read speed is close
to zero and everything is stuck.

Siting some details from previous messages:

> > top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61, 
> > 149.29
> > Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
> > %Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si, 0.0 
> > st
> > KiB Mem:  65922104 total, 65414856 used,   507248 free,     1844 buffers
> > KiB Swap:        0 total,        0 used,        0 free. 62570804 cached Mem
> >
> >    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> > COMMAND
> >   8644 root      20   0       0      0      0 R  96.5  0.0 127:21.95 
> > kworker/u16:16
> >   5047 dvr       20   0 6884292 122668   4132 S   6.4  0.2 258:59.49 
> > dvrserver
> > 30223 root      20   0   20140   2600   2132 R   6.4  0.0   0:00.01 top
> >      1 root      20   0    4276   1628   1524 S   0.0  0.0   0:40.19 init
> >
> > There are about 300 treads on server, some of which are writing on disk.
> > A bit information about this btrfs filesystem: this is 22 disk file
> > system with raid1 for metadata and raid0 for data:
> >
> >   # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
> > System, RAID1: total=8.00MiB, used=1.27MiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, RAID1: total=46.00GiB, used=33.49GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=128.00KiB
> >   # btrfs property get /store/
> > ro=false
> > label=store
> >   # btrfs device stats /store/
> > (shows all zeros)
> >   # btrfs balance status /store/
> > No balance found on '/store/'

 # btrfs filesystem show
Label: 'store'  uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf
        Total devices 22 FS bytes used 6.50TiB
        devid    1 size 931.51GiB used 558.02GiB path /dev/sdb
        devid    2 size 931.51GiB used 559.00GiB path /dev/sdc
        devid    3 size 931.51GiB used 559.00GiB path /dev/sdd
        devid    4 size 931.51GiB used 559.00GiB path /dev/sde
        devid    5 size 931.51GiB used 559.00GiB path /dev/sdf
        devid    6 size 931.51GiB used 559.00GiB path /dev/sdg
        devid    7 size 931.51GiB used 559.00GiB path /dev/sdh
        devid    8 size 931.51GiB used 559.00GiB path /dev/sdi
        devid    9 size 931.51GiB used 559.00GiB path /dev/sdj
        devid   10 size 931.51GiB used 559.00GiB path /dev/sdk
        devid   11 size 931.51GiB used 559.00GiB path /dev/sdl
        devid   12 size 931.51GiB used 559.00GiB path /dev/sdm
        devid   13 size 931.51GiB used 559.00GiB path /dev/sdn
        devid   14 size 931.51GiB used 559.00GiB path /dev/sdo
        devid   15 size 931.51GiB used 559.00GiB path /dev/sdp
        devid   16 size 931.51GiB used 559.00GiB path /dev/sdq
        devid   17 size 931.51GiB used 559.00GiB path /dev/sdr
        devid   18 size 931.51GiB used 559.00GiB path /dev/sds
        devid   19 size 931.51GiB used 559.00GiB path /dev/sdt
        devid   20 size 931.51GiB used 559.00GiB path /dev/sdu
        devid   21 size 931.51GiB used 559.01GiB path /dev/sdv
        devid   22 size 931.51GiB used 560.01GiB path /dev/sdw

Btrfs v3.17.1

> > iostat 1 exposes following problem:
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >            16.96    0.00   17.09   65.95    0.00    0.00
> >
> > Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> > sda               0.00         0.00         0.00          0          0
> > sdc               0.00         0.00         0.00          0          0
> > sdb               0.00         0.00         0.00          0          0
> > sde               0.00         0.00         0.00          0          0
> > sdd               0.00         0.00         0.00          0          0
> > sdf               0.00         0.00         0.00          0          0
> > sdg               0.00         0.00         0.00          0          0
> > sdj               0.00         0.00         0.00          0          0
> > sdh               0.00         0.00         0.00          0          0
> > sdk               0.00         0.00         0.00          0          0
> > sdi               1.00         0.00       200.00          0        200
> > sdl               0.00         0.00         0.00          0          0
> > sdn              48.00         0.00     17260.00          0      17260
> > sdm               0.00         0.00         0.00          0          0
> > sdp               0.00         0.00         0.00          0          0
> > sdo               0.00         0.00         0.00          0          0
> > sdq               0.00         0.00         0.00          0          0
> > sdr               0.00         0.00         0.00          0          0
> > sds               0.00         0.00         0.00          0          0
> > sdt               0.00         0.00         0.00          0          0
> > sdv               0.00         0.00         0.00          0          0
> > sdw               0.00         0.00         0.00          0          0
> > sdu               0.00         0.00         0.00          0          0

At that time I saw such load profile. Write load changed from disk to
disk with time, so I do not suspect broken disk. Currently write profile
is different:
https://drive.google.com/file/d/0BygFL6N3ZVUAVmxaZ1Q5VTZpSGc/view?usp=sharing
Sometimes like above, sometimes all zero, most time load is very low.

> > write goes to one disk. I've tried to debug what's going in kworker and
> > did
> >
> > $ echo workqueue:workqueue_queue_work
> >> /sys/kernel/debug/tracing/set_event
> > $ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2

I've put result here:
https://drive.google.com/file/d/0BygFL6N3ZVUAMWxCQ0tDREE1Uzg/view?usp=sharing

> > Server has 64Gb of RAM. 
kernel is 3.16.7-gentoo

--
Peter.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to