On 02/03/18 15:58, Simon Gaiser wrote:
> Juergen Gross:
>> On 20/02/18 05:56, Simon Gaiser wrote:
>>> Juergen Gross:
>>>> On 07/02/18 23:22, Simon Gaiser wrote:
>>>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>>> Before on an error response to XS_TRANSACTION_END
>>>>> xenbus_dev_request_and_reply() would not decrement the active
>>>>> transaction counter. But xenbus_transaction_end() has always counted the
>>>>> transaction as finished regardless of the response.
>>>> Which is correct now. Xenstore will free all transaction related
>>>> data regardless of the response. A once failed transaction can't
>>>> be repaired, it has to be repeated completely.
>>> So if xenstore frees the transaction why should we keep it in the list
>>> with pending transaction in xenbus_dev_frontend? That's exactly what
>>> this patch fixes by always removing it from the list, not only on a
>>> successful response (See below for the EINVAL case).
>> Aah, sorry, I seem to have misread my own coding. :-(
>> Yes, you are right. Sorry for not seeing it before.
>>>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>>>> transaction twice corrupts the reference count. So fix this by also
>>>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>>>> regardless of the return code.
>>>> A transaction in the list of transactions should not considered to be
>>>> finished. Either it is not on the list or it is still pending.
>>> With "considering a transaction closed" I mean "take the code path which
>>> removes the transaction from the list with pending transactions".
>>> From the follow-up mail:
>>>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>>>> xenbus_transaction_end() will always count the transaction as finished
>>>>>> regardless the response code (handled in xs_request_exit()).
>>>>> ENOENT should not decrement the transaction counter, while all
>>>>> other responses to XS_TRANSACTION_END should still do so.
>>>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>>>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>>>> for ENOENT, though.
>>>> What should be handled is EINVAL: this would happen if a user specified
>>>> a string different from "T" and "F".
>>> Ok, I will handle those cases in xs_request_exit(). Although I don't
>>> like that this depends on the internals of xenstore (At least to me it's
>>> not obvious why it should only return ENOENT or EINVAL in this case).
>>> In the xenbus_write_transaction() case checking the string before
>>> sending the transaction (like the transaction itself is verified) would
>>> avoid this problem.
>> Right. I'd prefer this solution.
>> Remains the only problem you tried to tackle with your second patch: a
>> kernel driver going crazy and ending transactions it never started (or
>> ending them multiple times). The EINVAL case can't happen here, but
>> ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
>> need to keep track of the transactions like in the user interface and
>> refuse ending an unknown transaction. Or you trust the kernel users.
>> Trying to fix the usage counter seems to be the wrong approach IMO.
> The point of the second patch was to detect such bugs. This would have
> saved quite some time to find this bug. I added the "fix" of the counter
> I just because it was trivial after having the if there.
> Adding tracking seems to be a quite complex solution for a _potential_
> So I would go with checking ENOENT in xs_request_exit(). Should this be
> WARN_ON_ONCE()? Since this normally should not happen I would say yes.
Yes, having a WARN_ON_ONCE here will help.
> Should I keep the reference counter sanity check? And if yes, with the
> "fix" to the counter?
I'd drop it. This really should not happen and blowing up kernel size
with checks of impossible situations isn't the way to go.
In case you really want to do something here you can add something like
ASSERT(xs_state_users) before decrementing the counter.