On Fri, Apr 14, 2023 at 08:54:39AM -0700, Akemi Yagi wrote:
> Hi Nigel,
>
> kmod-drbd90-9.1.14-2.el8_7.x86_64.rpm will start syncing to our mirrors
> shortly.
Since we modularized the "transport",
DRBD consists of (at least) two modules,
"drbd" and "drbd_transport_tcp".
You ship
On Sat, Jan 08, 2022 at 04:24:54AM +, Eric Robinson wrote:
> According to the documentation, SSD TRIM/Discard support has been in DRBD
> since version 8. DRBD is supposed to detect if the underlying storage
> supports trim and, if so, automatically enable it. However, I am unable to
> TRIM my
On Mon, Aug 16, 2021 at 07:07:41AM +0100, TJ wrote:
> I've got a rather unique scenario and need to allow non-exclusive
> read-only opening of the drbd device on the Secondary.
> Then I thought to use a device-mapper COW snapshot - so the underlying
> drbd device is never changed but the snapshot
On Thu, Aug 05, 2021 at 11:53:44PM +0200, Janusz Jaskiewicz wrote:
> Hello.
>
> I'm experimenting a bit with DRBD in a cluster managed by Pacemaker.
> It's a two node, active-passive cluster and the service that I'm
> trying to put in the cluster writes to the file system.
> The service manages
On Tue, Aug 03, 2021 at 03:34:58PM -0700, Paul D. O'Rorke wrote:
> Hi all,
>
> I have been running a DRDB-8 simple 3 node disaster recovery set up of
> libvirt VMs for a number of years and have been very happy with it. Our
> needs are simple, 2 servers on Protocol C, each running a handful of
On Fri, Feb 26, 2021 at 07:09:29AM +0100, Fabio M. Di Nitto wrote:
> hey guys,
>
> similar to 9.0.27, log below.
>
> Any chance you can give me a quick and dirty fix?
> CC [M] /builddir/build/BUILD/drbd-9.0.28-1/drbd/drbd_main.o
> /builddir/build/BUILD/drbd-9.0.28-1/drbd/drbd_main.c: In
> * reliably detect split brain situation on both nodes
> * improve error reporting for failures during attach
> * implement 'blockdev --setro' in DRBD
> * following upstream changes to DRBD up to Linux 5.10 and ensure
>compatibility with Linux 5.8, 5.9, and 5.10
--
: Lars El
system or other use on top of DRBD that can not
tolerate a change in logical block size from one "mount" to the next,
then make sure to use IO backends with identical (or similar enough)
characteristics.
If you have a file system that can tolerate such a change,
you may get away wi
Well, obviously DRBD *does* still work just fine.
Though not for those that only upgrade the kernel
without upgrading the module, or vice versa. See below.
As we get more and more reports of people having problems with their
DRBD after a RHEL upgrade, let me quickly state the facts:
RHEL
Changes wrt RC2 (for RC2 announcement see below):
9.0.20-0rc3 (api:genl2/proto:86-115/transport:14)
* fix regression related to the quorum feature,
introduced by code deduplication; regression never released,
happened during this .20 development/release cycle
* completing aspects
-peer", NOT after-resync-target.
That was from the times when there was no unfence-peer handler,
and we overloaded/abused the after-resync-target handler
for this purpose.
> fencing resource-only;
>
> after-sb-0pri discard-least-changes;
You are automating data loss.
That
24:15 el8-a01n01.digimer.ca kernel: drbd test_server: Preparing
> cluster-wide state change 3514756670 (0->2 499/146)
> Jul 27 18:24:15 el8-a01n01.digimer.ca kernel: drbd test_server: State change
> 3514756670: primary_nodes=0, weak_nodes=0
> Jul 27 18:24:15 el8-a01n01.digimer.ca kern
d -v -z -o $(( ${size_gb} * 2**30 - 2**20 )) -l 1M /dev/$VG/$LV
Make your config file refer to disk /dev/$VG/$LV
dmesg -c > /dev/null# clear dmesg before the test
drbdadm -v create-md# on all nodes
drbdadm -v up all # on all nodes
dmesg # if it failed
drbdsetu
te-same".
> Maybe this one can also be used :
> https://chris.hofstaedtler.name/blog/2016/10/kernel319plus-3par-incompat.html
> finding before ATTRS{rev} property of disks.
For your specific hardware, probably yes.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD --
m, performance considerations,
using a single DRBD volume of that size is most likely not what you want.
If you really mean it, it will likely require a number of deviations
from "default" settings to work reasonaly well.
Do you mind sharing with us what you actually want?
What d
o allow "local replication",
but to be used for certain "ssl socket forwarding solutions",
or for use with the DRBD proxy.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRB
bd louie Centos63: Connection closed
>
>
> Is there any way of working around this ?
Well, now, did you even try what this message suggests?
Otherwise: maybe first 8.3 -> 8.4, then 8.4 -> 9
Or forget about the "rolling" upgrade,
just re-create the meta data as 9,
and
low-remote-read (disallow read from DR connections)
* some build changes
http://www.linbit.com/downloads/drbd/9.0/drbd-9.0.19-0rc1.tar.gz
https://github.com/LINBIT/drbd-9.0/tree/d1e16bdf2b71
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacema
quot;
once you actually try to use it, is ... not very friendly either.
> > [525370.955135] dm-16: error: dax access failed (-95)
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
ple
]
> [ 2643.473042] [] ? receive_Data+0x77e/0x18f0 [drbd]
Supposedly fixed with 9.0.18, more specifically with
7ce7cac6 drbd: fix potential spinlock deadlock on device->al_lock
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
D
On Fri, Mar 22, 2019 at 10:45:52AM +, Holger Kiehl wrote:
> Hello,
>
> I have megaraid controller with only SAS SSD's attached which always
> sets /sys/block/sdb/queue/rotational to 1. So, in /etc/rc.d/rc.local
> I just did a 'echo -n 0 > /sys/block/sdb/queue/rotational', that fixed
> it. But
s DRBD in normal operation).
You may need to adapt that filter
to allow lvm to see the backend device directly,
if you *mean* to bypass drbd in a recovery scenario.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are reg
On Fri, Dec 14, 2018 at 02:13:50PM +0100, Harald Dunkel wrote:
> Hi Lars,
>
> On 12/14/18 1:27 PM, Lars Ellenberg wrote:
> >
> > There was nothing dirty (~ 7 MB; nothing worth to mention).
> > So nothing to sync.
> >
> > But it takes some time to in
On Fri, Dec 14, 2018 at 09:32:14AM +0100, Harald Dunkel wrote:
> Hi folks,
>
> On 12/13/18 11:49 PM, Igor Cicimov wrote:
> > On Fri, Dec 14, 2018 at 2:57 AM Lars Ellenberg wrote:
> >
> >
> > Unlikely to have anything to do with DRBD.
> >
> >
uce, monitor
grep -e Dirty -e Writeback /proc/meminfo
and slabtop before/during/after umount.
Also check sysctl settings
sysctl vm | grep dirty
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered tradem
lusterdb > /tmp/metadata
> >
> > Found meta data is "unclean", please apply-al first
Well, there it tells you what is wrong
(meta data is "unclean"),
and what you should do about it:
("apply-al").
So how about just doing what it tells you to do
On Fri, Nov 23, 2018 at 11:55:32AM +0100, Robert Altnoeder wrote:
> On 11/23/18 11:00 AM, Michael Hierweck wrote:
> > Linbit announces the complete decoupling of LINSTOR from DRBD (Q1 2019).
> > [...]
> > Does this mean Linbit will abandon DRBD?
>
> Not at all,
TL;DR:
Marketing:
I fix it?
Probably by upgrading your drbd-utils.
If that's not sufficient,
we'll have to look into it.
We could add a single line workaround into the plugin,
but that would likely just mask a bug elsewhere.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Hea
Daniel Hertanu wrote:
> Hello Yannis,
>
> I tried that, same result, won't switch to primary.
Well, it says:
> >> [root@server2-drbd ~]# drbdadm primary resource01
> >> resource01: State change failed: (-2) Need access to UpToDate data
Does it have "access to UpTo
ut for the record,
if this was not only about redundancy, but also hopes
to increase bandwidth while all links are operational,
LACP does not increase bandwidth for a single TCP flow.
"bonding round robin" is the only mode that does.
Just saying.
--
: Lars Ellenberg
: LINBIT | Ke
On Fri, Oct 19, 2018 at 10:18:12AM +0200, VictorSanchez2 wrote:
> On 10/18/2018 09:51 PM, Lars Ellenberg wrote:
> > On Thu, Oct 11, 2018 at 02:06:11PM +0200, VictorSanchez2 wrote:
> > > On 10/11/2018 10:59 AM, Lars Ellenberg wrote:
> > > > On Wed, Oct 10, 201
e resync rate to minimize impact on
> applications using the storage. As it slows itself down to "stay out of
> the way", the resync time increases of course. You won't have redundancy
> until the resync completes.
>
> --
> Digimer
> Papers and Projects: https://a
On Thu, Oct 11, 2018 at 02:06:11PM +0200, VictorSanchez2 wrote:
> On 10/11/2018 10:59 AM, Lars Ellenberg wrote:
> > On Wed, Oct 10, 2018 at 11:52:34AM +, Garrido, Cristina wrote:
> > > Hello,
> > >
> > > I have two drbd devices configured on my cluster. O
"congestion" for the backing device.
Why it did that, and whether that was actually the case, and what
that actually means is very much dependend on that backing device,
and how it "felt" at the time of that status output.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World R
good way to deal with this case, as whether some DRBD step is
> missing, which leaves the process or killing the process is the right way?
Again, that "process" has nothing to do with drbd being "held open",
but is a kernel thread that is part of the existence of that DRBD volum
of congestion.
But read about "timeout" and "ko-count" in the users guide.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list --
l that whatever is
> being used can handle having the storage ripped out from under it.
Yes.
Also, when using a SyncTarget, many reads are no longer local,
because there is no good local data to read,
which may or may not be a serious performance hit,
depending on your workload.
--
: Lars
On Wed, Sep 19, 2018 at 04:57:08PM -0400, Daniel Ragle wrote:
> On 9/18/2018 10:51 AM, Lars Ellenberg wrote:
> > On Thu, Sep 13, 2018 at 04:36:54PM -0400, Daniel Ragle wrote:
> > > Anybody know where I need to start looking to figure this one out:
> > >
> &g
k=DRBD_NODE_ID_${DRBD_PEER_NODE_ID};
v=${!k};
[[ $v ]] && DRBD_PEER=$v;
fi
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send t
vailable.
I'm not exactly sure,
but I sure hope we have dropped the "indexed" flavor in DBRD 9.
Depending on the number of (max-) peers,
DRBD 9 needs more room for metadata than a "two-node only" DRBD.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRB
On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > In fact the first one is the original code path before I modified
> > blkback. The problem is it gets executed async from workqueue so
> > it might not always run
On Fri, Sep 07, 2018 at 02:13:48PM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> > Very frequently it is *NOT* the "original user", that "still" holds it
> > open, but udev, or something triggered-by-udev.
&
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device
On Wed, Aug 29, 2018 at 12:39:07PM -0400, David Bruzos wrote:
> Hi Lars,
> Thank you and the others for such a wonderful and useful system! Now, to
> your comment:
>
> >Um, well, while it may be "your proven method" as well, it actually
> >is the method documented in the drbdsetup man page and
ons,
then reconnects,
and syncs up.
> Second node:
>
> [Wed Aug 29 01:42:48 2018] drbd resource0: PingAck did not arrive in time.
Again, time stamps do not match up.
But there is your reason for this incident: "PingAck did not arrive in time".
Find out why, or simply increase t
ta
and an "unsatisfactory" replication link bandwidth,
you may want to look into the second typical use case of
"new-current-uuid", which we coined "truck based replication",
which is also documented in the drbdsetup man page.
(Or, do the initial sync with an &
gration with various virtualization solutions much easier.
Still, also in that case,
prepare to regularly upgrade both DRBD 9 and LINSTOR components.
There will be bugs, and bug fixes, and they will be relevant for your
environment.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DR
?
Some strangeness with the new NIC drivers?
A bug in the "shipped with the debian kernel" DRBD version?
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc
On Mon, Aug 27, 2018 at 06:15:09PM +0200, Julien Escario wrote:
> Le 27/08/2018 à 17:44, Lars Ellenberg a écrit :
> > On Mon, Aug 27, 2018 at 05:01:52PM +0200, Julien Escario wrote:
> >> Hello,
> >> We're stuck in a strange situation. One of our ressources is mark
On Mon, Aug 27, 2018 at 05:01:52PM +0200, Julien Escario wrote:
> Hello,
> We're stuck in a strange situation. One of our ressources is marked as :
> volume 0 (/dev/drbd155): UpToDate(normal disk state) Blocked: upper
>
> I used drbdtop to get this info because drbdadm hangs.
>
> I can also see
On Mon, Aug 27, 2018 at 08:21:35AM +, Jaco van Niekerk wrote:
> Hi
>
> cat /proc/drbd
> version: 8.4.11-1 (api:1/proto:86-101)
> GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@,
> 2018-04-26 12:10:42
> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C
clone-max=2 clone-node-max=1 notify=true
>
> I receive the following on pcs status:
> * my_iscsidata_monitor_0 on node2.san.localhost 'not configured' (6):
> call=9, status=complete, exitreason='meta parameter misconfigured,
> expected clone-max -le 2, but found unset.',
Well,
log
> bottleneck problem.
One LV -> DRBD Volume -> Filesystem per DB instance.
If the DBs are "logically related", have all volumes in one DRBD
resource. If not, separate DRBD resources, one volume each.
But whether or not that would help in your setup depends very much on
the typic
part.
it's what most people think when doing that:
use a huge single DRBD as PV, and put loads of unrelated LVS
inside of that.
Which then all share the single DRBD "activity log" of the single DRBD
volume, which then becomes a bottleneck for IOPS.
--
: Lars Ellenberg
: LINBIT | Keepi
resource create ...
pcs -f tmp_cfg resource master ...
pcs cluster push cib tmp_cfg
if you need to get things done,
don't take unknown short cuts, because, as they say,
the unknown short cut is the longest route to the destination.
though you may learn a lot along the way,
so if you are in the position wh
30)
> Library version: 1.02.137 (2016-11-30)
> Driver version: 4.37.0
> Is it bug or am I doing something wrong?
Thanks for the detailed and useful report,
definetely a serious and embarassing bug,
now already fixed internally.
Fix will go into 9.0.15 final.
We are in the progress
nerate your
> distro's initrd/initramfs to reflect the changes directly at startup.
Yes, don't forget that step ^^^ that one is important as well.
But really, most of the time, you really want LVM *below* DRBD,
and NOT above it. Even though it may "appear" to be convenient,
it is usually not w
On Tue, Jun 19, 2018 at 09:19:04AM +0200, Artur Kaszuba wrote:
> Hi Lars, thx for answer
>
> W dniu 18.06.2018 o 17:10, Lars Ellenberg pisze:
> > On Wed, Jun 13, 2018 at 01:03:53PM +0200, Artur Kaszuba wrote:
> > > I know about 3 node solution and i have used it for some
f you try again, do those numbers change?
If they change, do they still show such a pattern in hex digits?
> [ma. juni 18 14:44:43 2018] drbd drbd1/0 drbd1: we had at least one MD IO
> ERROR during bitmap IO
> [ma. juni 18 14:44:47 2018] drbd drbd1/0 drbd1: recounting of set bits took
>
write this post because stacked configuration is
> still described in documentation and should work? Unfortunately for
> now it is not possible to create such configuration or i missed
> something :/
I know there are DRBD 9 users using "stacked" configurations out there.
Maybe yo
should *upgrade*.
> or upgrade the drbd version ?
Yes, that as well.
> Thanks in advance for your help.
Cheers,
:-)
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R, Integration, Ops, Consulting, Support
DRBD® and LIN
on that can take a long time,
it is also (and especially) the wait_for_completion_io().
We could "make the warnings" go away by accepting only (arbitrary small
number) of discard requests at a time, and then blocking in
submit_bio(), until at least one of the pending ones completes.
But t
up to v4.15.x
> * new wire packet P_ZEROES a cousin of P_DISCARD, following the kernel
>as it introduced separated BIO ops for writing zeros and discarding
> * compat workaround for two RHEL 7.5 idiosyncrasies regarding refcount_t
>and struct nla_policy
--
: Lars Ellenber
ile IO is frozen
* fix various corner cases when recovering from multiple failure cases
https://www.linbit.com/downloads/drbd/8.4/drbd-8.4.11-1.tar.gz
https://github.com/LINBIT/drbd-8.4/tree/drbd-8.4.11
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Co
ql
>
> my auto solve config:
>
> net {
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
If the "good data" (by whatever metric)
happens to be secondary during that handshake,
and the "bad data" happens to be prima
ynchronous" replication here.
Going online with the Secondary now will look just like a "single system
crash", but like that crash would have happened a few requests earlier.
It may miss the latest few updates.
But it will still be consistent.
--
: Lars Ellenberg
: LINBIT
ctively
used, as is the case with live migrating VMs. Which would not have to be
that way, it could do with single primary even, by switching roles "at
the righ time"; but hypervisors do not implement it that way currently.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
ike it used to be".
The peer-disk state of the DR node as seen by drbdsetup
may have some influence on the master-score calculations.
That's a feature, not a bug ;-)
(I think)
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® a
still connect to you ;-)
> Note: I down'ed the dr node (node 3) an repeated the test. This time,
> the fence-handler was invoked. So I assume that DRBD did route through
> the third node. Impressive!
Yes, "sort of".
> So, is the Protocol C between 1 <-> 2 maintained, wh
On Thu, Feb 08, 2018 at 02:52:10PM -0600, Ricky Gutierrez wrote:
> 2018-02-08 7:28 GMT-06:00 Lars Ellenberg <lars.ellenb...@linbit.com>:
> > And your config is?
>
> resource zimbradrbd {
> allow-two-primaries;
Why dual primary?
I doubt you really need that.
a loss.
So if you don't mean that, don't do it.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send t
e on both server.
> Anyone a idea? We have also a ticket at HGST and they tried also a lot.
If you want, contact LINBIT,
we should be able to help you get this all set up in a sane way.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
o tell it to try and be more or less aggressive wrt. the ongoing
"application" IO that is concurrently undergoing live replication,
because both obviously share the network bandwidth,
as well as bandwidth and IOPS of the storage backends.
These knobs, and their defaults, are documented in th
> Most importantly: once the trimtester (or *any* "corruption detecting"
> tool) claims that a certain corruption is found, you look at what
supposedly is
> corrupt, and double check if it in fact is.
>
> Before doing anything else.
>
I did that, but I don't know what a "good" file is supposed to
s found, you look at what
supposedly is corrupt, and double check if it in fact is.
Before doing anything else.
Double check if the tool would still claim corruption exists,
even if you cannot see that corruption with other tools.
If so, find out why that tool does that,
because that'd be clearly
I seriously doubt, but I am biased),
there may be something else going on still...
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list
ster-is-broken
o=trimtester-is-broken/x1
echo X > $o
l=$o
for i in `seq 2 32`; do
o=trimtester-is-broken/x$i;
cat $l $l > $o ;
rm -f $l;
l=$o;
done
./TrimTester trimtester-is-broken
Wahwahwa Corrupted file found: trimteste
evice see which request) may be changed.
Maximum request size may be changed.
Maximum *discard* request size *will* be changed,
which may result in differently split discard requests on the backend stack.
Also, we have additional memory allocations for DRBD meta data and housekeeping,
so possibly dif
the linux kernel high level
block device api used the term "back then" (BIO_RW_BARRIER), do no
longer exist in today's Linux kernels. That however does not mean
we could drop the config keyword, nor that we can drop the functionality
there yet, until all "old" kernels are really
oved wrong.
To gather a few more data points,
does the behavior on DRBD change, if you
disk { disable-write-same; } # introduced only with drbd 8.4.10
or if you set
disk { al-updates no; } # affects timing, among other things
Can you reproduce with other backend devices?
--
: Lars Elle
ix the problem?
Don't put a "primitive" DRBD definition live
without the corresponding "ms" definition.
If you need to, populate a "shadow" cib first,
and only commit that to "live" once it is fully populated.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital Worl
u need to "convert" the 8.4
back to the 8.3 magic, using the 8.4 compatible drbdmeta tool,
because, well, unsurprisingly the 8.3 drbdmeta tool does not know
the 8.4 magics.
So if you intend to downgrade to 8.3 from 8.4,
while you still have the 8.4 tools installed,
do: "drbdadm down al
ach from it.
Now it is no longer there.
> but D1 has failed after a D2
> failure,
Too bad, now we have no data anymore.
> but before D2 has recovered. What is the behavior of DRBD in such
> a case? Are all future disk writes blocked until both D1 and D2 are
> available, and are co
te, c-max-rate,
possibly send and receive buffer sizes.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
_
t; directive. (Followed instructions from DRBD 9 manual).
>
> Any info that might help in fixing this is welcome.
With DRBD 9, you want to use
"fence-peer crm-fence-peer.9.sh" and
"unfence-peer crm-unfence-peer.9.sh"
(mind the .9.)
--
: Lars Ellenberg
: LINBIT |
there is anything relating to "out of memory" in there.
Can you still reproduce?
If so, can you capture the kernel logs during the "crash" somehow?
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are
) them. But when using it "in spec",
they don't trigger (or we'd had a lot of angry customers).
That being said,
again,
> > If you want snapshot shipping,
> > use a system designed for snapshot shipping.
> > DRBD is not.
Cheers,
--
: Lars Ellenberg
: LINBI
nsistent version of the data before becoming
inconsistent to mitigate that.
Still, "constantly" cycling between
Connected
While not idle for long enough
Ahead/Behind,
SyncSource/SyncTarget
is a bad idea.
If you want snapshot shipping,
use a system designed for snapshot
.4.6-5.
>
> call stack:
> <4>[66071017.155051] Modules linked in: softdog drbd(FN)
What did you need to force the module for?
Probably *that* is your problem right there.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemake
ide any additional info that is needed if you let me know
> what is required.
IO stack (e.g. lsblk ; lsblk -t ; lsblk -D) may be interesting,
as well as the drbd configuration (drbdadm dump ),
the kernel logs around the "crash" if possible,
and your "best guess" as to what
nd,
which means you hit some other bottleneck much earlier
(the discard bandwidth of the backing storage...)
Note that DRBD 9.0.8 still has a problem with discards larger than 4 MB,
though (will hit protocoll error, disconnect, and reconnect).
That is already fixed in git, 9.0.9rc1 has that fixed.
(8.4.10 a
On Thu, Jul 27, 2017 at 10:11:48AM +0200, Gionatan Danti wrote:
> To clarify: the main reason I am asking about the feasibility of a
> dual-primary DRBD setup with LVs on top of it is about cache coherency. Let
> me do a step back: the given explaination for deny even read access on a
> secondary
an other fix (unrelated to your scenario,
related to the request size verification) even post 9.0.8.
So yes, it *will* help.
And no, you will not have any luck with 9.0.1. Not at all.
And not only for discards.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbea
s
Or mount debugfs, and find some information there.
No, that is not supposed to exist,
and may change at any time, and will not be documented,
so don't rely on anything you may find there.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pa
A cluster,
why not simply stay with 8.4.
In that scenario, there is currently nothing to gain from 9,
and since 8.4 can optimize some situations (it "knows" there can
only be one peer), it will even give better performance sometimes.
--
: Lars Ellenberg
: LINBIT | Keeping the D
ically, systemd udev here) thinks that executing that
helper program took too long.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subs
On Fri, Jun 09, 2017 at 11:39:05PM +0800, David Lee wrote:
> Hi,
>
> I am experimenting with DRBD dual-primary with OCFS 2, and DRBD client as
> well.
> With the hope that every node can access the storage in an unified way.
> But I got a
> kernel call trace and huge number of ASSERTION failure
ould drbdadm complain about?
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
_
am using.
> drbd.x86_64 8.2.7-3 installed
Not suggesting that this would have anything to do with what you are
seeing, but this is "ancient" (you knew that, of course).
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacema
en.
If it did attempt to do that and failed,
you will have to look into why, which, again, should be in the logs.
Double check constraints, and also double check if GFS2/DLM fencing is
properly integrated with pacemaker.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
:
1 - 100 of 1069 matches
Mail list logo