On Fri, Oct 19, 2018 at 10:18:12AM +0200, VictorSanchez2 wrote: > On 10/18/2018 09:51 PM, Lars Ellenberg wrote: > > On Thu, Oct 11, 2018 at 02:06:11PM +0200, VictorSanchez2 wrote: > > > On 10/11/2018 10:59 AM, Lars Ellenberg wrote: > > > > On Wed, Oct 10, 2018 at 11:52:34AM +0000, Garrido, Cristina wrote: > > > > > Hello, > > > > > > > > > > I have two drbd devices configured on my cluster. On both nodes the > > > > > status shows "blocked:lower" although everything seems to be fine. We > > > > > have conducted IO tests on the physical devices and on the drbd > > > > > devices with good results. Do you know why this message is shown and > > > > > how to debug it? > > > > > > > > > > The message from status command: > > > > > > > > > > xxxx:/dev/mapper # drbdsetup status --verbose --statistics > > > > > ASCS node-id:1 role:Primary suspended:no > > > > > write-ordering:flush > > > > > volume:0 minor:0 disk:UpToDate > > > > > size:10452636 read:3247 written:8185665 al-writes:53 > > > > > bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no > > > > > blocked:lower > > > > "blocked:lower" means that the in-kernel API for querying block > > > > device info congestion reported "congestion" for the backing device. > > > > Why it did that, and whether that was actually the case, and what > > > > that actually means is very much dependend on that backing device, > > > > and how it "felt" at the time of that status output. > > > > > > > Thanks Lars, > > > > > > Do you know how DRBD asks kernel about congestion information? Which is > > > the > > > system call it makes? > > DRBD is part of the kernel. No system call involved. > > We call bdi_congested() which is a wrapper around wb_congested(), > > both defined in linux/include/backing-dev.h > > > > > We want to know why is marking it as "blocked:lower", > > just ignore that wording. don't panic just because it says "blocked"... > > > > > because we are making heavy performance test and seems that there is > > > no problem at disk or network level. > > "congestion" does not mean "no progress". > > Just that you reached some kind of, well, congestion, and likely, that, > > if you where to even increase the "IO load", you'd probably just make > > the latency tail longer, and not improve throughput or IOPS anymore. > > > > so you throw "heavy" IO against the IO stack. as a result, you drive > > the IO stack into "congestion". and if you ask it for some status, > > it reports that back. > > > > no surprise there. > > > > > We think that DRBD/kernel is not getting the correct information from > > > the system. > > afaics, blk_set_congested() is called when a queue has more than > > "nr_congestion_on" requests "in flight", and it is cleared once that > > drops below "nr_congestion_off" again. both hysteresis watermarks are > > set in relation to the queue "nr_requests", which again is a tunable. > > > > > Thanks Lars, > > how we can tune nr_requests? By default is at 128, and we can't increase it:
It's not about DRBD, it's about the storage backend. > # cat /sys/block/drbd1/queue/nr_requests > 128 > echo 129 > /sys/block/drbd1/queue/nr_requests > -bash: echo: write error: Invalid argument sure. DRBD is a "virtual" device, which does not even have a queue. nr_requests for DRBD has no actual meaning. > in any case, I think that increase the nr_requests will not solve the > problem. Well, do you have any indication that there actually is a "problem"? If your only "problem" is the string "blocked:lower" in the drbdsetup status output, may I suggest to just ignore that? -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD® and LINBIT® are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user