Re: [DRBD-user] Skipping initial sync, and full sync after node failure

Jean-Francois Chevrette Thu, 22 Oct 2009 13:08:03 -0700

Hi Ian,

when creating a new resource which doesn't have any data that you wantto keep on either node, you can use the following:


# drbdadm -- --clear-bitmap new-current-uuid drbd0

You can see the documentation here and make sure this applies to yoursituation:

http://www.drbd.org/users-guide/re-drbdadm.html

Then, your second issue should not occur as both nodes will now have asynchronized bitmap. It least that's how I understand it.



Regards,
--
Jean-François Chevrette [iWeb]



On 09-10-22 8:57 AM, Ian Marlier wrote:

Hi, All --

I'm working on getting DRBD up and running for a large storage array --
around 10TB.  I'm having two issues that I suspect are related, and am
hoping that someone might be able to help me out with them.
Specifically, I'm wondering whether these issues are related; and if so,
whether there is a method that can be used to allow the desired behavior.

First of all, 10TB requires a fair amount of time to sync.  Even if a
10Gbps network is used, drbd's limit of 650MB/s means that a full sync
would take between 4 and 5 hours.  With a 1Gbps network, that time rises
to closer to 18 hours since the effective speed is 125MB/s.

Because of this, I'm hoping to avoid the initial sync phase.  I'm
starting with empty disks, and so I don't need the bitmap to be
synchronized for data preservation or anything like that.  In googling
around, I found the  following command, which does in fact have the
effect of causing both nodes to report that they are UpToDate:
     drbdadm -- 6::::1 set-gi resource

So, the first question is whether this is, in fact, the appropriate
command to use if one wants to avoid the initial sync.  Is there
another  method that's preferred?  Is it simply not possible to skip the
initial sync any longer?  What I really want is a way to tell drbd to
sync the bitmap without actually syncing data, since there isn't data
that I care about.

The second issue that I'm having is that having established both nodes
as UpToDate using the command above, and having swapped the Primary role
back and forth between the hosts successfully, if one node fails (or is
rebooted), it requires a full sync after coming back online.  This
happens even if the other node was primary at the time that the local
machine went down, and if no changes have been made to the local node.
It appears that there is something going on with the size of the bitmap
changing on the rebooted host, based on the logs, though that doesn't
really make all that much sense to me:
     Oct 21 17:16:11 scurry4 kernel: drbd0: No usable activity log found.
     Oct 21 17:16:11 scurry4 kernel: drbd0: max_segment_size ( = BIO
size ) = 32768
     Oct 21 17:16:11 scurry4 kernel: drbd0: drbd_bm_resize called with
capacity == 21462221048
     Oct 21 17:16:11 scurry4 kernel: drbd0: resync bitmap:
bits=2682777631 words=41918401
     Oct 21 17:16:11 scurry4 kernel: drbd0: size = 10 TB (10731110524 KB)
     Oct 21 17:16:11 scurry4 kernel: drbd0: Writing the whole bitmap,
size changed
     Oct 21 17:16:11 scurry4 kernel: drbd0: writing of bitmap took 468
jiffies
     Oct 21 17:16:11 scurry4 kernel: drbd0: 10 TB (2682777631 bits)
marked out-of-sync by on disk bit-map.
     Oct 21 17:16:12 scurry4 kernel: drbd0: reading of bitmap took 289
jiffies
     Oct 21 17:16:12 scurry4 kernel: drbd0: recounting of set bits took
additional 271 jiffies
     Oct 21 17:16:12 scurry4 kernel: drbd0: 10 TB (2682777631 bits)
marked out-of-sync by on disk bit-map.
     Oct 21 17:16:12 scurry4 kernel: drbd0: disk( Attaching ->
Inconsistent )
     Oct 21 17:16:12 scurry4 kernel: drbd0: Writing meta data super
block now.

The remote host, which remained up, shows this in its logs:
     Oct 21 17:16:48 scurry24 kernel: drbd0: Becoming sync source due to
disk states.
     Oct 21 17:16:48 scurry24 kernel: drbd0: Writing the whole bitmap,
full sync required after drbd_sync_handshake.
     Oct 21 17:16:48 scurry24 kernel: drbd0: Writing meta data super
block now.

I'm wondering whether its possible that this behavior is related to the
skipped sync documented above, or if it may be related in some way to
the size of the device being synced.  Has anyone seen this before, or
can anyone shed some light on that?

Basic info: OS is CentOS x86_64.  Kernel version is
2.6.18-128.1.10.el5.  DRBD is version 8.2.6-2.

Thanks for any help,

Ian



_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user




_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Skipping initial sync, and full sync after node failure

Reply via email to