On 08/16/2015 12:24 PM, Doug Ledford wrote:
> On 08/16/2015 04:19 AM, Matan Barak wrote:
>>
>>
>> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>>> Separate the cleanup and release IB cache functions.
>>>> The cleanup function delete all GIDs and unregister the event channel,
>>>> while the release function frees all memory. The cleanup function is
>>>> called after all clients were removed (in the device un-registration
>>>> process).
>>>>
>>>> The release function is called after the device was put.
>>>> This is in order to avoid use-after-free errors if ther vendor
>>>> driver's teardown code uses IB cache.
>>>>
>>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>>> Signed-off-by: Matan Barak <[email protected]>
>>>
>>> I had originally used the v1 of this patch instead of v2.  In order to
>>> make sure I didn't miss anything, I hand compared the final results of
>>> the v1 patch to what this patch would have created if I had used it.  I
>>> found a few things worth nothing, comments inline below.
>>>
>>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>>                         (rdma_end_port(device) -
>>>>                          rdma_start_port(device) + 1),
>>>>                         GFP_KERNEL);
>>>> -    err = gid_table_setup_one(device);
>>>> -
>>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>>> +    if (!device->cache.pkey_cache ||
>>>>           !device->cache.lmc_cache) {
>>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>>                  "for %s\n", device->name);
>>>> -        err = -ENOMEM;
>>>> -        goto err;
>>>> +        /* pkey_cache is freed here because we later access it's
>>>> +         * elements.
>>>> +         */
>>>> +        kfree(device->cache.pkey_cache);
>>>> +        device->cache.pkey_cache = NULL;
>>>> +        return -ENOMEM;
>>>
>>> This function is unnecessarily convoluted IMO.  A simple change to using
>>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>>> result, I fixed this function up to be different than either patch.
>>> Please look over my results and make sure you agree with my changes.
>>>
>>>>       }
>>>>
>>>> -    for (p = 0; p <= rdma_end_port(device) -
>>>> rdma_start_port(device); ++p) {
>>>> +    for (p = 0; p <= rdma_end_port(device) -
>>>> rdma_start_port(device); ++p)
>>>>           device->cache.pkey_cache[p] = NULL;
>>>
>>> For instance this goes away entirely by using kzalloc().
>>>
>>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>>> and cache to make the v1 patch match the v2 patch.
>>>
>>>
>>
>> Thanks for the squashing and re-ordering these patches.
>> I think you're missing if (device->cache.pkey_cache) over the for pkey
>> free loop in ib_cache_release_one.
> 
> I think you're right, but I think that particular issue exists in both
> versions of the code (mine and yours).  In particular, a failure during
> ib_register_device should result in a call to ib_dealloc_device which
> will result in a call to ib_cache_release_one, which will happen whether
> ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
> must check pkey_cache before releasing the ports because it can't assume
> the original setup succeeded.
> 
>> Otherwise, the allocation of
>> device->cache.pkey_cache might has failed and you dereference an invalid
>> address device->cache.pkey_cache[p].
>>
>> In addition, ib_device_release calls ib_cache_release_one with device
>> instead of dev (which has the wrong type).
>> ib_register_device calls ib_cache_clean_one that I can't really find :)
> 
> Both of these turned up in compile tests.  I've since resolved them.
> Hand merge issues ;-)
> 

I've pushed what I think is a fixed version to github.  Please feel free
to review.

-- 
Doug Ledford <[email protected]>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to