Hi,

> date stamps on the notify-* scripts are all uniform (predating the system 
> build) & I don't recall modifying them at all.

good.

> From the logs, I'm curious about the lines...
> Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044910] block drbd9: 0 KB (0 bits) 
> marked out-of-sync by on disk bit-map.
> Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044929] block drbd9: Marked 
> additional 508 MB as out-of-sync based on AL.
> 
> ...then a little further down
> Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( 
> Secondary -> Primary )

A little further down as in "45 minutes later" ;-)


> On both nodes, I also noticed ... block drbd9: helper command: /sbin/drbdadm 
> split-brain minor-9 exit code 127 (0x7f00)

Not sure about this one, but if you *had* split brain, there would have
been no sync-back (or any standing DRBD connection).

> Somehow the (508 Mb?) data has rolled back, & while I'm sad I've likely lost 
> the data, I can't afford to release this system to production until I'm 
> content it won't happen again.

The thing about the AL (activity log) is: It is only active when your
nodes are in sync. Your primary will only sync back hot extents from its
peer if the peer's data is known to be up to date! It should not be
possible to loose any data because of sync-back of hot AL extents.

To make this more clear: Whenever your nodes are in sync, DRBD keeps
track of the last ~500 MB (in your case, it depends on the al_extents
setting) that were written. This information is stored permanently in
the metadata of the Primary. When it goes down and comes up again, it
marks those 500MB as "out of date". This is helpful: When coming up
after a hard crash with possible data loss, the Primary can restore any
lost writes from the Secondary. Note that this will not destroy data:
The Secondary will become SyncSource only if it's UpToDate.

> The userland tools are ...
> 
> drbdadm --version
> DRBDADM_BUILDTAG=GIT-hash:\ ea9e28dbff98e331a62bcbcc63a6135808fe2917\ build\ 
> by\ buildd@yellow\,\ 2010-06-01\ 11:06:12
> DRBDADM_API_VERSION=88
> DRBD_KERNEL_VERSION_CODE=0x080307
> DRBDADM_VERSION_CODE=0x080307
> DRBDADM_VERSION=8.3.7
> 
> Any assistance to help me dig a little deeper here, will be greatly 
> appreciated.

Are you absolutely certain that you lost data? Because from here, it
sure doesn't look like it. (As a matter of fact, it doesn't even look
like split brain - has any of your notify scripts fired?)

Regards,
Felix
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to