On Fri, Oct 19, 2018 at 10:18:12AM +0200, VictorSanchez2 wrote:
> On 10/18/2018 09:51 PM, Lars Ellenberg wrote:
> > On Thu, Oct 11, 2018 at 02:06:11PM +0200, VictorSanchez2 wrote:
> > > On 10/11/2018 10:59 AM, Lars Ellenberg wrote:
> > > > On Wed, Oct 10, 2018 at 11:52:34AM +0000, Garrido, Cristina wrote:
> > > > > Hello,
> > > > > 
> > > > > I have two drbd devices configured on my cluster. On both nodes the 
> > > > > status shows "blocked:lower" although everything seems to be fine. We 
> > > > > have conducted IO tests on the physical devices and on the drbd 
> > > > > devices with good results. Do you know why this message is shown and 
> > > > > how to debug it?
> > > > > 
> > > > > The message from status command:
> > > > > 
> > > > > xxxx:/dev/mapper # drbdsetup status --verbose --statistics
> > > > > ASCS node-id:1 role:Primary suspended:no
> > > > >       write-ordering:flush
> > > > >     volume:0 minor:0 disk:UpToDate
> > > > >         size:10452636 read:3247 written:8185665 al-writes:53 
> > > > > bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no 
> > > > > blocked:lower
> > > > "blocked:lower" means that the in-kernel API for querying block
> > > > device info congestion reported "congestion" for the backing device.
> > > > Why it did that, and whether that was actually the case, and what
> > > > that actually means is very much dependend on that backing device,
> > > > and how it "felt" at the time of that status output.
> > > > 
> > > Thanks Lars,
> > > 
> > > Do you know how DRBD asks kernel about congestion information? Which is 
> > > the
> > > system call it makes?
> > DRBD is part of the kernel. No system call involved.
> > We call bdi_congested() which is a wrapper around wb_congested(),
> > both defined in linux/include/backing-dev.h
> > 
> > > We want to know why is marking it as "blocked:lower",
> > just ignore that wording. don't panic just because it says "blocked"...
> > 
> > > because we are making heavy performance test and seems that there is
> > > no problem at disk or network level.
> > "congestion" does not mean "no progress".
> > Just that you reached some kind of, well, congestion, and likely, that,
> > if you where to even increase the "IO load", you'd probably just make
> > the latency tail longer, and not improve throughput or IOPS anymore.
> > 
> > so you throw "heavy" IO against the IO stack.  as a result, you drive
> > the IO stack into "congestion".  and if you ask it for some status,
> > it reports that back.
> > 
> > no surprise there.
> > 
> > > We think that DRBD/kernel is not getting the correct information from
> > > the system.
> > afaics, blk_set_congested() is called when a queue has more than
> > "nr_congestion_on" requests "in flight", and it is cleared once that
> > drops below "nr_congestion_off" again.  both hysteresis watermarks are
> > set in relation to the queue "nr_requests", which again is a tunable.
> > 
> > 
> Thanks Lars,
> 
> how we can tune nr_requests? By default is at 128, and we can't increase it:

It's not about DRBD, it's about the storage backend.

> # cat /sys/block/drbd1/queue/nr_requests
> 128
> echo 129 > /sys/block/drbd1/queue/nr_requests
> -bash: echo: write error: Invalid argument

sure. DRBD is a "virtual" device, which does not even have a queue.
nr_requests for DRBD has no actual meaning.

> in any case, I think that increase the nr_requests will not solve the
> problem.

Well, do you have any indication that there actually is a "problem"?

If your only "problem" is the string "blocked:lower"
in the drbdsetup status output, may I suggest to just ignore that?

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to