On Wed, Aug 05, 2020 at 04:26:37PM -0700, Jacob Keller wrote:
> The subject says "potential" leak, but in fact it looks like we leaked
> every single time we succeeded.
> 

[...]

> 
> Ok, so this is a bit confusing, and I think it is right but the commit
> message is wrong.
> 
> run_pmc appears to call msg_put whenever it fails, so we do want to
> avoid calling msg_put on a failure.
> 
> But either (a) we were always leaking the msg because we weren't
> cleaning it up before or (b) on success run_pmc_events shouldn't cleanup
> the message since it would be used by the caller.. but that doesn't make
> sense since the msg pointer is a local variable.
> 
> Based on the other callers, I think this is right, but the commit
> message is wrong.
> 

I reviewed this again, and it looks like there isn't any leak in fact.
It's just that the code is very confusing.

run_pmc_events() is special because it calls run_pmc() with a
managementId (ds_id argument) of -1 ("not a managementId").

My confusion stems from the fact that run_pmc() is in fact used for 2
things:
- send a management TLV that requests a GET on a data set, or initiates
  a subscription event. This is the normal code path where the caller of
  run_pmc() is the one who recycles the management PTP message to the
  pool.
- just poll in a nonblocking fashion for responses to management TLVs
  that were sent previously. In this case, when a management TLV is
  received, a completely different code path is executed in this
  function. It will also have a different return code (always zero, for
  timeout, see below) when used in this way.

In the latter case, run_pmc (and run_pmc_events) and recv_subscribed are
much more tightly coupled than I thought.

The reason why recv_subscribed has this check:

        mgt_id = get_mgt_id(msg);
        if (mgt_id == excluded)
                return 0;

is precisely because it is intended to be the callback of
run_pmc_events, which calls with a ds_id of -1 (that doesn't satisfy the
equality check), as opposed to all other callers which return the
managementId of the PTP management message.

When recv_subscribed processes a message, it returns 1.  When it ignores
it, it returns 0.  This is confusing, because it doesn't actually
matter, in both cases the effect is the same: the management message is
freed in run_pmc() itself:

                if (res <= 0 || recv_subscribed(node, *msg, ds_id) ||
                    get_mgt_id(*msg) != ds_id) {
                        msg_put(*msg);
                        *msg = NULL;
                        continue;
                }

_Only_ if called with ds_id == -1.

Now the patch that I added doesn't actually ever run. It's dead code.
Look again closer at the above snippet, it "continues" the polling loop,
instead of returning 1 right below. So in the end, this will return back
to the caller (run_pmc_events) as 0 (timeout), even if a PTP message was
processed.

So in short, the code is pretty convoluted.

I'll try to:
- pass recv_subscribed as an argument to run_pmc(), called with NULL
  from everywhere except from run_pmc_events(). This will get rid of the
  special value of ds_id == -1 since I could just check for the presence
  of a handler function pointer as an alternative.
- simply not call recv_subscribed unless given as function pointer,
  which is on on one hand intuitively natural, and on another hand,
  equivalent to the current logic of recv_subscribed filtering out
  "excluded" managementId's.
- remove the "continue" instruction altogether, and just return 1 if a
  message was received. I don't see why not. It's confusing to return a
  timeout.
- don't free the message if it's a non-management response or an error
  response, leave that up to the caller (now that we're not continuing
  the loop any longer). This will improve consistency with the other
  call paths.

Thanks,
-Vladimir


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to