Hi Bob,

No, we haven't but it wouldn't be hard for us to replace our patches in our 
internal patchqueue with these and try them. Will let you know what we find.

We have also seen, what we think is an unrelated issue where we get the 
following backtrace in kern.log and our system stalls

Sep 21 21:19:09 cl15-05 kernel: [21389.462707] INFO: task python:15480 blocked 
for more than 120 seconds.
Sep 21 21:19:09 cl15-05 kernel: [21389.462749]       Tainted: G           O    
4.4.0+10 #1
Sep 21 21:19:09 cl15-05 kernel: [21389.462763] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 21 21:19:09 cl15-05 kernel: [21389.462783] python          D 
ffff88019628bc90     0 15480      1 0x00000000
Sep 21 21:19:09 cl15-05 kernel: [21389.462790]  ffff88019628bc90 
ffff880198f11c00 ffff88005a509c00 ffff88019628c000
Sep 21 21:19:09 cl15-05 kernel: [21389.462795]  ffffc90040226000 
ffff88019628bd80 fffffffffffffe58 ffff8801818da418
Sep 21 21:19:09 cl15-05 kernel: [21389.462799]  ffff88019628bca8 
ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68
Sep 21 21:19:09 cl15-05 kernel: [21389.462803] Call Trace:
Sep 21 21:19:09 cl15-05 kernel: [21389.462815]  [<ffffffff815a1cd4>] 
schedule+0x64/0x80
Sep 21 21:19:09 cl15-05 kernel: [21389.462877]  [<ffffffffa0663624>] 
find_insert_glock+0x4a4/0x530 [gfs2]
Sep 21 21:19:09 cl15-05 kernel: [21389.462891]  [<ffffffffa0660c20>] ? 
gfs2_holder_wake+0x20/0x20 [gfs2]
Sep 21 21:19:09 cl15-05 kernel: [21389.462903]  [<ffffffffa06639ed>] 
gfs2_glock_get+0x3d/0x330 [gfs2]
Sep 21 21:19:09 cl15-05 kernel: [21389.462928]  [<ffffffffa066cff2>] 
do_flock+0xf2/0x210 [gfs2]
Sep 21 21:19:09 cl15-05 kernel: [21389.462933]  [<ffffffffa0671ad0>] ? 
gfs2_getattr+0xe0/0xf0 [gfs2]
Sep 21 21:19:09 cl15-05 kernel: [21389.462938]  [<ffffffff811ba2fb>] ? 
cp_new_stat+0x10b/0x120
Sep 21 21:19:09 cl15-05 kernel: [21389.462943]  [<ffffffffa066d188>] 
gfs2_flock+0x78/0xa0 [gfs2]
Sep 21 21:19:09 cl15-05 kernel: [21389.462946]  [<ffffffff812021e9>] 
SyS_flock+0x129/0x170
Sep 21 21:19:09 cl15-05 kernel: [21389.462948]  [<ffffffff815a57ee>] 
entry_SYSCALL_64_fastpath+0x12/0x71

We think there is a possibility, given that this code path only gets entered if 
a glock is being destroyed, that there is a time of check, time of use issue 
here where by the time that schedule gets called the thing which we expect to 
be waking us up has completed dying and therefore won't trigger a wakeup for 
us. We only seen this a couple of times in fairly intensive VM stress tests 
where a lot of flocks get used on a small number of lock files (we use them to 
ensure consistent behaviour of disk activation/deactivation and also access to 
the database with the system state) but it's concerning nonetheless. We're 
looking at replacing the call to schedule with schedule_timeout with a timeout 
of maybe HZ to ensure that we will always get out of the schedule operation and 
retry. Is this something you think you may have seen or have any ideas on?

Thanks,

        Mark.

-----Original Message-----
From: Bob Peterson <[email protected]> 
Sent: 28 September 2018 13:24
To: Mark Syms <[email protected]>
Cc: [email protected]; Ross Lagerwall <[email protected]>; Tim 
Smith <[email protected]>
Subject: Re: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance 
improvements

----- Original Message -----
> Thanks for that Bob, we've been watching with interest the changes 
> going in upstream but at the moment we're not really in a position to 
> take advantage of them.
> 
> Due to hardware vendor support certification requirements XenServer 
> can only very occasionally make big kernel bumps that would affect the 
> ABI that the driver would see as that would require our hardware partners to 
> recertify.
> So, we're currently on a 4.4.52 base but the gfs2 driver is somewhat 
> newer as it is essentially self-contained and therefore we can 
> backport change more easily. We currently have most of the GFS2 and 
> DLM changes that are in
> 4.15 backported into the XenServer 7.6 kernel, but we can't take the 
> ones related to iomap as they are more invasive and it looks like a 
> number of the more recent performance targeting changes are also 
> predicated on the iomap framework.
> 
> As I mentioned in the covering letter, the intra host problem would 
> largely be a non-issue if EX glocks were actually a host wide thing 
> with local mutexes used to share them within the host. I don't know if 
> this is what your patch set is trying to achieve or not. It's not so 
> much that that selection of resource group is "random", just that 
> there is a random chance that we won't select the first RG that we 
> test, it probably does work out much the same though.
> 
> The inter host problem addressed by the second patch seems to be less 
> amenable to avoidance as the hosts don't seem to have a synchronous 
> view of the state of the resource group locks (for understandable 
> reasons as I'd expect thisto be very expensive to keep sync'd). So it 
> seemed reasonable to try to make it "expensive" to request a resource 
> that someone else is using and also to avoid immediately grabbing it 
> back if we've been asked to relinquish it. It does seem to give a 
> fairer balance to the usage without being massively invasive.
> 
> We thought we should share these with the community anyway even if 
> they only serve as inspiration for more detailed changes and also to 
> describe the scenarios where we're seeing issues now that we have 
> completed implementing the XenServer support for GFS2 that we 
> discussed back in Nuremburg last year. In our testing they certainly 
> make things better. They probably aren’t fully optimal as we can't 
> maintain 10g wire speed consistently across the full LUN but we're 
> getting about 75% which is certainly better than we were seeing before we 
> started looking at this.
> 
> Thanks,
> 
>       Mark.

Hi Mark,

I'm really curious if you guys tried the two patches I posted here from
17 January 2018 in place of the two patches you posted. We see much better 
throughput with those over stock.

I know Steve wants a different solution, and in the long run it will be a 
better one, but I've been trying to convince him we should use them as a 
stop-gap measure to mitigate this problem until we get a more proper solution 
in place (which is obviously taking some time, due to unforeseen circumstances).

Regards,

Bob Peterson

Reply via email to