Hello,

I would like to summarize in this message some methods that can significantly 
improve speed of bareos backup on Ceph object store.  Also I leave a patch for 
developers, which hopefully will be included in Bareos.

For impatient:
 - Maximum Block Size = 4194304  (add to Pool definition)
 - Compile in libradosstriper in bareos-sd
 - apply patch to have objects > 4M

One of the exceptional achievements of Bareos is its speed.  Ceph also have 
very good speed, data redundancy, on-the-fly storage increase and other 
goodies.  But Bareos and Ceph do not perform well together.  See for example 
https://groups.google.com/d/msg/bareos-users/dz_Cb-DxQ0w/WB6v0KR1GQAJ
In that particular case the speed was only 3MB/s, when one would expect from 
Rados benchmarking 70MB/s.  I also have similar situations on two different 
installations (Ceph jewel and Ceph luminous).

1. Fundamental problem - Ceph elementary piece of data is called object, and 
the low-level library which works with objects is called Rados.  For 
performance reasons Rados object should be 4M-32M of size (the bigger the 
object - the longer delay for data recovery in case of disk/network failure)  
Elementary data piece of Bareos is Volume and its size for Full backup is 
usually 25G-50G.  The latest version of Ceph made this situation even worse: 
earlier versions Ceph allowed objects of size up to 100G, but now there is a 
limit of 132M.

In Bareos language Volume = Rados Object and with such drastic difference in 
size there seem to be no solution at all.  With small object size Bareos 
database will choke on too many volumes and with large object size Ceph will 
perform poorly.  

2. But actually there is a solution and it is called striper mode.  This mode 
was introduced by CERN, who had gazillions of physics data and who wanted to be 
as close to Object Store as possible (they did not want Block Device 
abstraction, HTTP, or Ceph FS in their way, perhaps like Bareos developers).  
So they build a mode which striped big objects into little objects.  This way 
they store small size objects but refer to them as to very large chunks of data 
of almost unlimited size:

root@backup4:~# rados -p backup ls
Full-5409.0000000000000000
Full-5409.0000000000000001
Full-5409.0000000000000002
Full-5409.0000000000000003
.....
hundreds of objects

root@backup4:~# rados -p backup ls --striper
Full-5409
Incr-5822
Just 2 "big striped objects"

And thanks to Bareos developers this mode can be used.  I just introduced 
little modification and bug-fixes to overall excellent piece of code.

3. I would argue that:
 - Bareos distribution have to have libradosstriper compiled in by default. 
(Now libradosstriper is NOT compiled in in binaries distributed on Bareos 
Web-site)
 - In configuration files Rados striped mode should be the default Bareos mode, 
and you have to make an effort to turn it off.

4. Underline low-level write routine is called rados_striper_write ().  It is a 
blocking call and it performs well when block size is several megabytes.  In 
Bareos block size default parameter is 64K, so you can significantly boost 
performance by setting in pool definition: 
  Maximum Block Size = 4194304

 3,39 MB/s      Block Size=64K  
20,44 MB/s      Block Size=1M
32,90 MB/s      Block Size=2M
39,95 MB/s      Block Size=4M  (you can not increase Block Size more)   

5. If you do not apply the patch the Rados Device definition can only be:
Device Options = 
"conffile=/etc/ceph/ceph.conf,poolname=backup,striped,stripe_unit=4194304,stripe_count=1"
(Attention! - no spaces allowed in between double quotes).

Any other combination will kill storage daemon.  If you are OK with this 
setting of striping, then you can use official code.  

6. The logic of striper mode is very much the same as in RAID-0.  There are 3 
parameters that drives it (There is almost no documentation on this in Ceph):

striper_unit - the stripe size  (default=4M)
stripe_count - how many objects to write in parallel (default=1)
object_size  - when to stop increasing object size and create new objects.  
(default =4M)

For example if you write 132M of data (132 1M pieces of data) in striped mode 
with the following parameters (object_size should always be > stripe_unit):
striper_unit = 8M
striper_count = 4
object_size = 24M
Then 8 objects will be created with (4 with 24M size and 4 with 8M size)

Obj1=24M    Obj2=24M    Obj3=24M    Obj4=24M
00 .. 07    08 .. 0f    10 .. 17    18 .. 1f  <-- consecutive 1M pieces of data
20 .. 27    21 .. 2f    30 .. 37    38 .. 3f
40 .. 47    48 .. 4f    50 .. 57    58 .. 5f

Obj5= 8M    Obj6= 8M    Obj7= 8M    Obj8= 8M
60 .. 67    68 .. 6f    70 .. 77    78 .. 7f  

So perhaps if you have 4 or 16 OSDs and you would like to get better 
performance you may want to play with these parameters.  In my case I have very 
very modest Ceph installation and did not see any significant improvement.

If you apply the patch you can try different setting for striping, for example:
Device Options = 
"conffile=/etc/ceph/ceph.conf,poolname=backup,striped,stripe_unit=4194304,stripe_count=2,object_size=33554432"

7. Is there room for further improvement?  I believe there is.  It is called 
Asynchronous IO mode, which is also supported by libradosstriper but is not 
used in Bareos.  Actually I saw speed about 100MB/s with it, (I have only 1Gbps 
network core, so it could be more).  Also reports by CERN IT-group also suggest 
that AsyncIO is the way to increase performance.   

https://indico.cern.ch/event/524549/contributions/2185945/attachments/1289528/1919824/CephForHighThroughput.pdf

I hope I can try to write this piece unless someone who is good in asyncIO 
programming is willing to step up.

8. I hope this will attract more people who are interested in using Bareos 
together with Ceph Object Storage.

Alexander.




-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.
diff -C3 -r -w ./bareos-Release-16.2.4-orig/src/stored/backends/rados_device.c ./bareos-Release-16.2.4-new/src/stored/backends/rados_device.c
*** ./bareos-Release-16.2.4-orig/src/stored/backends/rados_device.c	2016-10-16 18:14:36.000000000 +0300
--- ./bareos-Release-16.2.4-new/src/stored/backends/rados_device.c	2017-10-16 01:33:28.032995404 +0300
***************
*** 43,48 ****
--- 43,49 ----
     argument_username,
     argument_striped,
     argument_stripe_unit,
+    argument_object_size,
     argument_stripe_count
  };
  
***************
*** 64,69 ****
--- 65,71 ----
  #ifdef HAVE_RADOS_STRIPER
     { "striped", argument_striped, 7 },
     { "stripe_unit=", argument_stripe_unit, 12 },
+    { "object_size=", argument_object_size, 12 },   
     { "stripe_count=", argument_stripe_count, 13 },
  #endif
     { NULL, argument_none }
***************
*** 137,142 ****
--- 139,148 ----
                    size_to_uint64(bp + device_options[i].compare_size, &m_stripe_unit);
                    done = true;
                    break;
+                case argument_object_size:
+                   size_to_uint64(bp + device_options[i].compare_size, &m_object_size);
+                   done = true;
+                   break;
                 case argument_stripe_count:
                    m_stripe_count = str_to_int64(bp + device_options[i].compare_size);
                    done = true;
***************
*** 254,259 ****
--- 260,272 ----
              Emsg0(M_FATAL, 0, errmsg);
              goto bail_out;
           } 
+          
+          status = rados_striper_set_object_layout_object_size(m_striper, m_object_size);
+          if (status < 0) {
+             Mmsg3(errmsg, _("Unable to set RADOS striper object size to %d  for pool %s: ERR=%s\n"), m_object_size, m_rados_poolname, be.bstrerror(-status));
+             Emsg0(M_FATAL, 0, errmsg);
+             goto bail_out;
+          }
        }
  #endif
     }
***************
*** 314,320 ****
  
  bail_out:
     if (m_cluster_initialized) {
!       rados_shutdown(&m_cluster);
        m_cluster_initialized = false;
     }
  
--- 327,333 ----
  
  bail_out:
     if (m_cluster_initialized) {
!       rados_shutdown(m_cluster);
        m_cluster_initialized = false;
     }
  
***************
*** 594,600 ****
     }
  
     if (m_cluster_initialized) {
!       rados_shutdown(&m_cluster);
        m_cluster_initialized = false;
     }
  
--- 607,613 ----
     }
  
     if (m_cluster_initialized) {
!       rados_shutdown(m_cluster);
        m_cluster_initialized = false;
     }
  
***************
*** 633,640 ****
     m_ctx = NULL;
  #ifdef HAVE_RADOS_STRIPER
     m_stripe_volume = false;
!    m_stripe_unit = 0;
!    m_stripe_count = 0;
     m_striper = NULL;
  #endif
     m_virtual_filename = get_pool_memory(PM_FNAME);
--- 646,654 ----
     m_ctx = NULL;
  #ifdef HAVE_RADOS_STRIPER
     m_stripe_volume = false;
!    m_stripe_unit = 4194304;
!    m_stripe_count = 1;
!    m_object_size = 4194304;
     m_striper = NULL;
  #endif
     m_virtual_filename = get_pool_memory(PM_FNAME);
diff -C3 -r -w ./bareos-Release-16.2.4-orig/src/stored/backends/rados_device.h ./bareos-Release-16.2.4-new/src/stored/backends/rados_device.h
*** ./bareos-Release-16.2.4-orig/src/stored/backends/rados_device.h	2016-10-16 18:14:36.000000000 +0300
--- ./bareos-Release-16.2.4-new/src/stored/backends/rados_device.h	2017-10-15 01:00:00.738631428 +0300
***************
*** 63,68 ****
--- 63,69 ----
     bool m_stripe_volume;
     uint64_t m_stripe_unit;
     uint32_t m_stripe_count;
+    uint64_t m_object_size;
  #endif
     rados_t m_cluster;
     rados_ioctx_t m_ctx;

Reply via email to