Dear Aleksandr,
Am 15.10.24 um 13:05 schrieb Aleksandr Loktionov:
-----Original Message-----
From: Paul Menzel <[email protected]>
Sent: Tuesday, October 15, 2024 12:33 PM
[…]
Am 15.10.24 um 11:45 schrieb Loktionov, Aleksandr:
-----Original Message-----
From: Paul Menzel <[email protected]>
Sent: Tuesday, October 15, 2024 10:24 AM
Thank you for your patch. For the summary I’d make it more about the
action of the patch like “Add intermediate filter to …”.
Sorry, I don't get your point. This patch fixes bug that can stop
vfs to receive any traffic making them useless. The first and most
visible effect of the bug is a lot of "Error I40E_AQ_RC_ENOSPC
adding RX filters on VF XX,..." errors from F/W complaining to add
MAC/VLAN filter. So I've mentioned it in the title to be easy found.
I don't add any filter in the driver, we have to add one more
intermediate state of the filter to avoid the race condition.
In my opinion, having the log in the body is good enough for search
engines and the summary should be optimized for `git log --oneline`
consumption. I am sorry about the confusion with my example. Maybe:
Add intermediate sync state to fix race condition
I just wonder why do you insist on "ADD" which usually means
implementing a new feature. When this patch FIXes the bug?
Some projects use this interpretation/structure, but to my knowledge not
the Linux kernel.
If I'd want to add a new feature I'd rather send my patch to net-next,
isn't it?
*Add*, how I used it, does not imply a new feature addition.
For me 'fix race condition by adding filter's intermediate sync
state'
Can you explain your strong argument for the Add?
I am fine with your suggestion, and do not oppose to start with *Fix*.
Also, thank you for your elaborate answer, so I could understand the
point of view you came from.
Am 15.10.24 um 09:04 schrieb Aleksandr Loktionov:
Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.
1. Thread T0 allocates a filter in i40e_add_filter() within
i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
refers to the already freed filter memory, causing corruption.
Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
MAC addresses on the VFs and change port VLANs on the host.
It’d be great if you shared your commands.
Sorry, I'm pretty sure it's quite impossible to reproduce the issue
with bash scripts /*I've tried really hard*/. Reproduction is
related to user-space/kernel code which might be not open-sourced.
So as I've explained in the commit title the race condition
possibility that was introduced from the very beginning.
Could you please ask to get clearance to publish it. My naive view
wonders, why legal(?) should oppose publication.
Simply the defect can come from development tools or from external
customer. And being tract as a commercial secret, which even doesn't
belong to Intel sometimes.
I know, that these are general problems. But in this specific case you
should know the circumstances, and be able to document it in the commit
message, why a test case cannot be published.
Kind regards,
Paul