Hello everyone.

I have a fairly simple 2-node CentOS 7 setup running KVM virtual
machines, with DRBD 8.4.9 between them.

There is one DRBD resource per VM, with at least 1 volume each,
totalling 47 volumes.

There's no clustering or heartbeat or other complexity. DRBD has it's
own Gig-E interface to sync over.

I recently migrated a host between nodes and it crashed. During
diagnostics I did a verification on the drbd volume for the host and
found that it had _a lot_ of out of sync blocks.

This led me to run a verification on all volumes, and while I didn't
find any other volumes with large numbers of out of sync blocks, there
were several with a few. I have disconnected and reconnected all these
volumes, to force them to resync.

I have now set up a nightly cron which will verify as many volumes as
it can in a 2 hour window, this means I get through the whole lot in
about a week.

Almost every night, it reports at least 1 volume which is out-of-sync,
and I'm trying to understand why that would be.

I did some research and the only likely candidate I could find was
related to TCP checksum offloading on the NICs, which I have now
disabled, but it has made no difference.

Any suggestions what might be going on here?

Thanks.

Luke Pascoe
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to