And here is perfcounters, a few seconds after ceph starts
write data (disk util goes up in dstat)
2012/1/16 Andrey Stepachev <[email protected]>:
> Hi all.
>
> Last week I've investigate the status for hadoop on ceph.
> I create some patches to remove some bugs and crashes.
> Looks like it works. Even hbase works on top.
>
> For reference all sources and patches are here
>
> https://github.com/octo47/hadoop-common/tree/branch-1.0-ceph
> https://github.com/octo47/ceph/tree/v0.40-hadoop
>
> After YCSB and TestDSFIO work without crashes i start investigate
> performance.
>
> I have 5node cluster with 4 sata disks. btrfs. 24core on each.
> raid. iozone shows up to 520MB/s.
>
> Performance differs in 2-3 times. After some tests i see strange thing.
> hadoop uses disk very close to iozone: small amount and iops and high
> throughtput (same as iozone).
> ceph uses very inefficient: huge amount of iops, up to 3 times less
> throughtput (i think because of high amount of iops).
>
> hadoop dstat output:
> sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total-
> util:util:util:util|usr sys idl wai hiq siq| read writ| read writ
> 100: 100: 100: 100| 1 5 83 11 0 0| 0 529M| 0 247
> 100: 100: 100: 100| 1 0 83 16 0 0| 0 542M| 0 168
> 100: 100: 100: 100| 1 0 81 18 0 0| 28k 518M|6.00 149
> 100: 100: 100: 100| 1 4 77 17 0 0| 0 533M| 0 243
> 100: 100: 100: 100| 1 3 83 13 0 0| 0 523M| 0 264
>
> ceph dstat output:
> ===================================================
> sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total-
> util:util:util:util|usr sys idl wai hiq siq| read writ| read writ
> 68.0:70.0:79.0:76.0| 1 2 93 4 0 0| 0 195M| 0 1723
> 86.0:85.0:93.0:91.0| 1 2 91 5 0 0| 0 226M| 0 1816
> 85.0:85.0:85.0:84.0| 1 3 92 4 0 0| 0 235M| 0 2316
>
>
> So, my question is: can someone point me:
> a) can it be because of inefficient buffer size on osd part
> (i tried to increase CephOutputStream buffer to 256kb, not helps)
> b) what other problems can be and what options can i tune
> to find out what is going on.
>
> PS: i can't use iozone on kernel mounted fs. something
> hang in kernel, only reboot helps.
> in /var/log/messages i see attached kern.log.
>
>
>
> --
> Andrey.
--
Andrey.
{ "filestore" : { "apply_latency" : { "avgcount" : 3752,
"sum" : 107.00700000000001
},
"bytes" : 1084026405,
"commitcycle" : 10,
"commitcycle_interval" : { "avgcount" : 10,
"sum" : 57.902500000000003
},
"commitcycle_latency" : { "avgcount" : 10,
"sum" : 7.89201
},
"committing" : 0,
"journal_bytes" : 974956029,
"journal_full" : 0,
"journal_latency" : { "avgcount" : 3739,
"sum" : 1361.1199999999999
},
"journal_ops" : 3739,
"journal_queue_bytes" : 109070376,
"journal_queue_max_bytes" : 104857600,
"journal_queue_max_ops" : 500,
"journal_queue_ops" : 13,
"op_queue_bytes" : 26742974,
"op_queue_max_bytes" : 104857600,
"op_queue_max_ops" : 500,
"op_queue_ops" : 3,
"ops" : 3752
},
"osd" : { "buffer_bytes" : 0,
"heartbeat_from_peers" : 4,
"heartbeat_to_peers" : 4,
"loadavg" : 0.41999999999999998,
"map_message_epoch_dups" : 10,
"map_message_epochs" : 14,
"map_messages" : 11,
"numpg" : 625,
"numpg_primary" : 209,
"numpg_replica" : 416,
"numpg_stray" : 0,
"op" : 137,
"op_in_bytes" : 160997316,
"op_latency" : { "avgcount" : 137,
"sum" : 190.98699999999999
},
"op_out_bytes" : 26871,
"op_r" : 9,
"op_r_latency" : { "avgcount" : 9,
"sum" : 3.02433
},
"op_r_out_bytes" : 26871,
"op_rw" : 0,
"op_rw_in_bytes" : 0,
"op_rw_latency" : { "avgcount" : 0,
"sum" : 0
},
"op_rw_out_bytes" : 0,
"op_rw_rlat" : { "avgcount" : 0,
"sum" : 0
},
"op_w" : 128,
"op_w_in_bytes" : 160997316,
"op_w_latency" : { "avgcount" : 128,
"sum" : 187.96299999999999
},
"op_w_rlat" : { "avgcount" : 128,
"sum" : 75.012200000000007
},
"op_wip" : 5,
"opq" : 7,
"pull" : 0,
"push" : 0,
"push_out_bytes" : 0,
"recovery_ops" : 0,
"subop" : 334,
"subop_in_bytes" : 735611145,
"subop_latency" : { "avgcount" : 334,
"sum" : 238.98599999999999
},
"subop_pull" : 0,
"subop_pull_latency" : { "avgcount" : 0,
"sum" : 0
},
"subop_push" : 0,
"subop_push_in_bytes" : 0,
"subop_push_latency" : { "avgcount" : 0,
"sum" : 0
},
"subop_w" : 0,
"subop_w_in_bytes" : 735611145,
"subop_w_latency" : { "avgcount" : 334,
"sum" : 238.98599999999999
}
}
}