I did find the journal configuration entries and they indeed did help for
this test, thanks
Configuration was:
journal_max_write_entries=100
journal_queue_max_ops=300
journal_queue_max_bytes=33554432
journal_max_write_bytes=10485760

Configuration after update:
journal_max_write_entries=10000
journal_queue_max_ops=50000
journal_queue_max_bytes=10485760000
journal_max_write_bytes=1073714824


Before changes:
dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
/mnt/ext4/output;
5120+0 records in
5120+0 records out
5242880000 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s

After changes:
dd if=/dev/zero of=/mnt/ceph-block-device/output bs=1000k count=5k; rm -f
/mnt/ceph-block-device/output;
5120+0 records in
5120+0 records out
5242880000 bytes (5.2 GB) copied, 3.20913 s, 1.6 GB/s

I still need to validate that this is better for our workload.
Thanks for your help.


On Mon, Mar 13, 2017 at 7:24 PM, Christian Balzer <ch...@gol.com> wrote:

>
> Hello,
>
> On Mon, 13 Mar 2017 11:25:15 -0400 Ben Erridge wrote:
>
> > On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer <ch...@gol.com> wrote:
> >
> > >
> > > Hello,
> > >
> > > On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:
> > >
> > > > I am testing attached volume storage on our openstack cluster which
> uses
> > > > ceph for block storage.
> > > > our Ceph nodes have large SSD's for their journals 50+GB for each
> OSD.
> > > I'm
> > > > thinking some parameter is a little off because with relatively small
> > > > writes I am seeing drastically reduced write speeds.
> > > >
> > > Large journals are a waste for most people, especially when your
> backing
> > > storage are HDDs.
> > >
> > > >
> > > > we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.
> > > >
> > > I hope that's not your plan for production, with a replica of 2 you're
> > > looking at pretty much guaranteed data loss over time, unless your OSDs
> > > are actually RAIDs.
> > >
> > > I am aware that replica of 3 is suggested thanks.
> >
> >
> > > 5GB journals tend to be overkill already.
> > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2016-March/008606.html
> > >
> > > If you were to actually look at your OSD nodes during those tests with
> > > something like atop or "iostat -x", you'd likely see that with
> prolonged
> > > writes you wind up with the speed of what your HDDs can do, i.e. see
> them
> > > (all or individually) being quite busy.
> > >
> >
> > That is what I was thinking as well which is not what I want. I want to
> > better utilize these large SSD journals. If I have 50GB journal
> > and I only want to write 5GB of data I should be able to get near SSD
> speed
> > for this operation. Why am I not?
> See the thread above and
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010754.html
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/038669.html
>
> > Maybe I should increase
> > *filestore_max_sync_interval.*
> >
> That is your least worry, even though it seems to be the first parameter
> to change.
> Use your google foo to find some really old threads about this.
>
> The journal* parameters are what you want to look at, see the threads
> above. And AFAIK Ceph will flush the journal at 50% full, no matter what.
>
> And at the end you will likely find that using your 50GB journals in full
> will be difficult and doing so w/o getting a very uneven performance
> nearly impossible.
>
> Christian
> >
> > >
> > > Lastly, for nearly everybody in real life situations the
> > > bandwidth/throughput becomes a distant second to latency
> considerations.
> > >
> >
> > Thanks for the advice however.
> >
> >
> > > Christian
> > >
> > > >
> > > >  here is our Ceph config
> > > >
> > > > [global]
> > > > fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
> > > > mon_initial_members = node-5 node-4 node-3
> > > > mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
> > > > auth_cluster_required = cephx
> > > > auth_service_required = cephx
> > > > auth_client_required = cephx
> > > > filestore_xattr_use_omap = true
> > > > log_to_syslog_level = info
> > > > log_to_syslog = True
> > > > osd_pool_default_size = 1
> > > > osd_pool_default_min_size = 1
> > > > osd_pool_default_pg_num = 64
> > > > public_network = 192.168.0.0/24
> > > > log_to_syslog_facility = LOG_LOCAL0
> > > > osd_journal_size = 50000
> > > > auth_supported = cephx
> > > > osd_pool_default_pgp_num = 64
> > > > osd_mkfs_type = xfs
> > > > cluster_network = 192.168.1.0/24
> > > > osd_recovery_max_active = 1
> > > > osd_max_backfills = 1
> > > >
> > > > [client]
> > > > rbd_cache = True
> > > > rbd_cache_writethrough_until_flush = True
> > > >
> > > > [client.radosgw.gateway]
> > > > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> > > > keyring = /etc/ceph/keyring.radosgw.gateway
> > > > rgw_socket_path = /tmp/radosgw.sock
> > > > rgw_keystone_revocation_interval = 1000000
> > > > rgw_keystone_url = 192.168.0.2:35357
> > > > rgw_keystone_admin_token = ZBz37Vlv
> > > > host = node-3
> > > > rgw_dns_name = *.ciminc.com
> > > > rgw_print_continue = True
> > > > rgw_keystone_token_cache_size = 10
> > > > rgw_data = /var/lib/ceph/radosgw
> > > > user = www-data
> > > >
> > > > This is the degradation I am speaking of..
> > > >
> > > >
> > > > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f
> > > > /mnt/ext4/output;
> > > > 1024+0 records in
> > > > 1024+0 records out
> > > > 1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s
> > > >
> > > > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f
> > > > /mnt/ext4/output;
> > > > 2048+0 records in
> > > > 2048+0 records out
> > > > 2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s
> > > >
> > > >  dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f
> > > > /mnt/ext4/output;
> > > > 3072+0 records in
> > > > 3072+0 records out
> > > > 3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s
> > > >
> > > > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
> > > > /mnt/ext4/output;
> > > > 5120+0 records in
> > > > 5120+0 records out
> > > > 5242880000 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s
> > > >
> > > > Any suggestions for improving the large write degradation?
> > >
> > >
> > > --
> > > Christian Balzer        Network/Systems Engineer
> > > ch...@gol.com           Global OnLine Japan/Rakuten Communications
> > > http://www.gol.com/
> > >
> >
> >
> >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>



-- 
-------------.
Ben Erridge
Center For Information Management, Inc.
(734) 930-0855
3550 West Liberty Road Ste 1
Ann Arbor, MI 48103
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to