On Thu, Feb 11, 2016 at 01:39:09PM -0500, Bob Peterson wrote:
> The problem is: While testing the dlm in multiple recovery situations,
> Nate and I discovered multiple problems. Until recently, no one has tried
> to run recovery tests on an upstream DLM,

(Let's distinguish tcp connection testing/recovery vs locking
testing/recovery.  I agree we've never looked at the tcp connections too
much since the node is typically dead anyway.)

> I agree that some of these patches might be unnecessary improvements.
> I'll try to pare them down to what is absolutely necessary and what
> is not. I'll also document exactly why the necessary ones are needed.

Improvements are fine, I was just confused about which were fixes vs

> I'll also try to post them in order of highest priority and repost
> them as individual patches rather than a set.
> The recovery tests are somewhat slow, so this will take some time.
> BTW, Have you had a chance to look at the patch I posted on 18 January,
> titled "DLM: Replace nodeid_to_addr with kernel_getpeername"?
> That definitely fixes one bug in patch b3a5bbfd which you mentioned.

Great, thanks, that's the key one that I'd missed or forgotten.

> I assume you're not suggesting I combine that patch with other patches
> to stabilize b3a5bbfd, right? As you well know, this is very touchy
> code and it's easier to diagnose and debug a larger number of smaller
> patches.

No, I don't have any concerns with the other improvements/fixes you have
since the main issue was fixed in that nodeid_to_addr replacement.

Reply via email to