Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/2/1 18:45, Hannes Reinecke wrote: On 2/1/21 10:40 AM, Chao Leng wrote: On 2021/2/1 16:57, Hannes Reinecke wrote: On 2/1/21 9:47 AM, Chao Leng wrote: On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; And this is where I have issues with. Where does 'old' come from? Clearly it was part of the list at one point; so what happened to it? I explained this earlier. In nvme_ns_remove, there is a hole between list_del_rcu and nvme_mpath_clear_current_path. If head->current_path is the "old", and the "old" is removing. The "old" is already removed from the list by list_del_rcu, but head->current_path is not clear to NULL by nvme_mpath_clear_current_path. Find path is race with nvme_ns_remove, use the "old" pass to nvme_round_robin_path to find path. Ah. So this should be better: @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, struct nvme_ns *ns) { - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; + if (ns && !test_bit(NVME_NS_REMOVING, >flags)) { + ns = list_next_or_null_rcu(>list, >siblings, + struct nvme_ns, siblings); + if (ns) + return ns; + } return list_first_or_null_rcu(>list, struct nvme_ns, siblings); } The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned. Looks useless, it is still infinite loop. You can check the workflow for the scenario I mentioned. Cheers, Hannes
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2/1/21 10:40 AM, Chao Leng wrote: On 2021/2/1 16:57, Hannes Reinecke wrote: On 2/1/21 9:47 AM, Chao Leng wrote: On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; And this is where I have issues with. Where does 'old' come from? Clearly it was part of the list at one point; so what happened to it? I explained this earlier. In nvme_ns_remove, there is a hole between list_del_rcu and nvme_mpath_clear_current_path. If head->current_path is the "old", and the "old" is removing. The "old" is already removed from the list by list_del_rcu, but head->current_path is not clear to NULL by nvme_mpath_clear_current_path. Find path is race with nvme_ns_remove, use the "old" pass to nvme_round_robin_path to find path. Ah. So this should be better: @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, struct nvme_ns *ns) { - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; + if (ns && !test_bit(NVME_NS_REMOVING, >flags)) { + ns = list_next_or_null_rcu(>list, >siblings, + struct nvme_ns, siblings); + if (ns) + return ns; + } return list_first_or_null_rcu(>list, struct nvme_ns, siblings); } The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/2/1 16:57, Hannes Reinecke wrote: On 2/1/21 9:47 AM, Chao Leng wrote: On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; And this is where I have issues with. Where does 'old' come from? Clearly it was part of the list at one point; so what happened to it? I explained this earlier. In nvme_ns_remove, there is a hole between list_del_rcu and nvme_mpath_clear_current_path. If head->current_path is the "old", and the "old" is removing. The "old" is already removed from the list by list_del_rcu, but head->current_path is not clear to NULL by nvme_mpath_clear_current_path. Find path is race with nvme_ns_remove, use the "old" pass to nvme_round_robin_path to find path. Cheers, Hannes
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2/1/21 9:47 AM, Chao Leng wrote: On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; And this is where I have issues with. Where does 'old' come from? Clearly it was part of the list at one point; so what happened to it? Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/2/1 15:29, Hannes Reinecke wrote: On 2/1/21 3:16 AM, Chao Leng wrote: On 2021/1/29 17:20, Hannes Reinecke wrote: On 1/29/21 9:46 AM, Chao Leng wrote: On 2021/1/29 16:33, Hannes Reinecke wrote: On 1/29/21 8:45 AM, Chao Leng wrote: On 2021/1/29 15:06, Hannes Reinecke wrote: On 1/29/21 4:07 AM, Chao Leng wrote: On 2021/1/29 9:42, Sagi Grimberg wrote: You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; Where does 'old' pointing to? This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. No. nvme_next_ns() will return NULL. If there is just one path(the "old") and the "old" is deleted, nvme_next_ns() will return NULL. The list like this: head->next = head; old->next = head; If there is two or more path and the "old" is deleted, "for" will be infinite loop. because nvme_next_ns() will return the path which in the list except the "old", check condition will be true for ever. But that will be caught by the statement above: if (list_is_singular(>list)) no? Two path just a sample example. If there is just two path, will enter it, may cause no path but there is actually one path. It is falsely assumed that the "old" must be not deleted. If there is more than two path, will cause infinite loop. So you mean we'll need something like this? diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 71696819c228..8ffccaf9c19a 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, struct nvme_ns *ns) { - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; + if (ns) { + ns = list_next_or_null_rcu(>list, >siblings, + struct nvme_ns, siblings); + if (ns) + return ns; + } No, in the scenario, ns should not be NULL. Why not? 'ns == NULL' is precisely the corner-case this is trying to fix... May be we can do like this: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 282b7a4ea9a9..b895011a2cbd 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) return found; } -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, - struct nvme_ns *ns) -{ - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; - return list_first_or_null_rcu(>list, struct nvme_ns, siblings); -} +#define nvme_next_ns_condition(head, current, condition) \ +({ \ + struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ + &(current)->siblings, struct nvme_ns, siblings); \ + __ptr ? __ptr : (condition) ? (condition) = false, \ + list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ + siblings) : NULL; \ +}) Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; My patch work flow: nvme_next_ns_condition(head, old, true) return ns2;
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2/1/21 3:16 AM, Chao Leng wrote: On 2021/1/29 17:20, Hannes Reinecke wrote: On 1/29/21 9:46 AM, Chao Leng wrote: On 2021/1/29 16:33, Hannes Reinecke wrote: On 1/29/21 8:45 AM, Chao Leng wrote: On 2021/1/29 15:06, Hannes Reinecke wrote: On 1/29/21 4:07 AM, Chao Leng wrote: On 2021/1/29 9:42, Sagi Grimberg wrote: You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; Where does 'old' pointing to? This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. No. nvme_next_ns() will return NULL. If there is just one path(the "old") and the "old" is deleted, nvme_next_ns() will return NULL. The list like this: head->next = head; old->next = head; If there is two or more path and the "old" is deleted, "for" will be infinite loop. because nvme_next_ns() will return the path which in the list except the "old", check condition will be true for ever. But that will be caught by the statement above: if (list_is_singular(>list)) no? Two path just a sample example. If there is just two path, will enter it, may cause no path but there is actually one path. It is falsely assumed that the "old" must be not deleted. If there is more than two path, will cause infinite loop. So you mean we'll need something like this? diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 71696819c228..8ffccaf9c19a 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, struct nvme_ns *ns) { - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; + if (ns) { + ns = list_next_or_null_rcu(>list, >siblings, + struct nvme_ns, siblings); + if (ns) + return ns; + } No, in the scenario, ns should not be NULL. Why not? 'ns == NULL' is precisely the corner-case this is trying to fix... May be we can do like this: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 282b7a4ea9a9..b895011a2cbd 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) return found; } -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, - struct nvme_ns *ns) -{ - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; - return list_first_or_null_rcu(>list, struct nvme_ns, siblings); -} +#define nvme_next_ns_condition(head, current, condition) \ +({ \ + struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ + &(current)->siblings, struct nvme_ns, siblings); \ + __ptr ? __ptr : (condition) ? (condition) = false, \ + list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ + siblings) : NULL; \ +}) Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/29 17:20, Hannes Reinecke wrote: On 1/29/21 9:46 AM, Chao Leng wrote: On 2021/1/29 16:33, Hannes Reinecke wrote: On 1/29/21 8:45 AM, Chao Leng wrote: On 2021/1/29 15:06, Hannes Reinecke wrote: On 1/29/21 4:07 AM, Chao Leng wrote: On 2021/1/29 9:42, Sagi Grimberg wrote: You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; Where does 'old' pointing to? This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. No. nvme_next_ns() will return NULL. If there is just one path(the "old") and the "old" is deleted, nvme_next_ns() will return NULL. The list like this: head->next = head; old->next = head; If there is two or more path and the "old" is deleted, "for" will be infinite loop. because nvme_next_ns() will return the path which in the list except the "old", check condition will be true for ever. But that will be caught by the statement above: if (list_is_singular(>list)) no? Two path just a sample example. If there is just two path, will enter it, may cause no path but there is actually one path. It is falsely assumed that the "old" must be not deleted. If there is more than two path, will cause infinite loop. So you mean we'll need something like this? diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 71696819c228..8ffccaf9c19a 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, struct nvme_ns *ns) { - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; + if (ns) { + ns = list_next_or_null_rcu(>list, >siblings, + struct nvme_ns, siblings); + if (ns) + return ns; + } No, in the scenario, ns should not be NULL. May be we can do like this: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 282b7a4ea9a9..b895011a2cbd 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) return found; } -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, - struct nvme_ns *ns) -{ - ns = list_next_or_null_rcu(>list, >siblings, struct nvme_ns, - siblings); - if (ns) - return ns; - return list_first_or_null_rcu(>list, struct nvme_ns, siblings); -} +#define nvme_next_ns_condition(head, current, condition) \ +({ \ + struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ + &(current)->siblings, struct nvme_ns, siblings); \ + __ptr ? __ptr : (condition) ? (condition) = false, \ + list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ + siblings) : NULL; \ +}) static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, int node, struct nvme_ns *old) { struct nvme_ns *ns, *found = NULL; + bool first_half = true; - if (list_is_singular(>list)) { - if (nvme_path_is_disabled(old)) - return NULL; - return old; - } - - for (ns = nvme_next_ns(head, old); + for (ns = nvme_next_ns_condition(head, old, first_half); ns && ns != old; -ns = nvme_next_ns(head, ns)) { +ns = nvme_next_ns_condition(head, ns, first_half)) { if
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 1/29/21 4:07 AM, Chao Leng wrote: On 2021/1/29 9:42, Sagi Grimberg wrote: You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; Where does 'old' pointing to? This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. No. nvme_next_ns() will return NULL. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/29 11:30, Sagi Grimberg wrote: You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. Who is being removed? I'm not following The "old" is being removed path. Daniel Wagner report crash like this: head->next = head; old->next = head; So nvme_next_ns(head, old) will return NULL, and then crash. Although check ns can avoid crash, but can not avoid infinite loop. Similar reason, The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; ns1 is other path. .
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. Who is being removed? I'm not following
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/29 9:42, Sagi Grimberg wrote: You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). ok, I see. nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. .
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible.
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/28 17:40, Daniel Wagner wrote: On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote: It is impossible to return NULL for nvme_next_ns(head, old). block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O BUG: kernel NULL pointer dereference, address: 0068 #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 800ff67bc067 P4D 800ff67bc067 PUD ff9ac9067 PMD 0 Oops: [#1] SMP PTI CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: GE 5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased) Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x02/06/2018 RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core] Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10 RSP: 0018:a69d08017af8 EFLAGS: 00010246 RAX: 92f261d87800 RBX: 92fa555b0010 RCX: 92fa555bc570 RDX: RSI: RDI: RBP: 0001 R08: 1000 R09: 1000 R10: a69d080179a8 R11: 92f264f0c1c0 R12: 92f264f7f000 R13: 92fa555b R14: 0001 R15: FS: 7f3962bae700() GS:92f29ffc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0068 CR3: 000fd69a2002 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: generic_make_request+0x121/0x300 ? submit_bio+0x42/0x1c0 submit_bio+0x42/0x1c0 ext4_io_submit+0x49/0x60 [ext4] ext4_writepages+0x625/0xe90 [ext4] ? do_writepages+0x4b/0xe0 ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4] do_writepages+0x4b/0xe0 ? __generic_file_write_iter+0x192/0x1c0 ? __filemap_fdatawrite_range+0xcb/0x100 __filemap_fdatawrite_range+0xcb/0x100 ? ext4_file_write_iter+0x128/0x3c0 [ext4] file_write_and_wait_range+0x5e/0xb0 __generic_file_fsync+0x22/0xb0 ext4_sync_file+0x1f7/0x3c0 [ext4] do_fsync+0x38/0x60 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. And I have positive feedback, this patch fixes the above problem. Although adding check ns can fix the crash. There may still be other problems such as in __nvme_find_path which use list_for_each_entry_rcu. .
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/28 17:23, Hannes Reinecke wrote: On 1/28/21 10:18 AM, Chao Leng wrote: On 2021/1/28 15:58, Daniel Wagner wrote: On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); - ns != old; + ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". The problem is when we enter nvme_round_robin_path() and there is no path available. In this case the initialization ns = nvme_next_ns(head, old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old". It is impossible to return NULL for nvme_next_ns(head, old). No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL when then end of the list is reached. Although list_next_or_null_rcu()/list_first_or_null_rcu() may return NULL, but nvme_next_ns(head, old) assume that the "old" is in the "head", so nvme_next_ns(head, old) should not return NULL. If the "old" is not in the "head", nvme_next_ns(head, old) will run abnormal. So there is other bug which cause nvme_next_ns(head, old). I review the code about head->list and head->current_path, I find 2 bugs may cause nvme_next_ns(head, old) abnormal: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. Cheers, Hannes
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote: > It is impossible to return NULL for nvme_next_ns(head, old). block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O BUG: kernel NULL pointer dereference, address: 0068 #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 800ff67bc067 P4D 800ff67bc067 PUD ff9ac9067 PMD 0 Oops: [#1] SMP PTI CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: GE 5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased) Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x02/06/2018 RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core] Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10 RSP: 0018:a69d08017af8 EFLAGS: 00010246 RAX: 92f261d87800 RBX: 92fa555b0010 RCX: 92fa555bc570 RDX: RSI: RDI: RBP: 0001 R08: 1000 R09: 1000 R10: a69d080179a8 R11: 92f264f0c1c0 R12: 92f264f7f000 R13: 92fa555b R14: 0001 R15: FS: 7f3962bae700() GS:92f29ffc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0068 CR3: 000fd69a2002 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: generic_make_request+0x121/0x300 ? submit_bio+0x42/0x1c0 submit_bio+0x42/0x1c0 ext4_io_submit+0x49/0x60 [ext4] ext4_writepages+0x625/0xe90 [ext4] ? do_writepages+0x4b/0xe0 ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4] do_writepages+0x4b/0xe0 ? __generic_file_write_iter+0x192/0x1c0 ? __filemap_fdatawrite_range+0xcb/0x100 __filemap_fdatawrite_range+0xcb/0x100 ? ext4_file_write_iter+0x128/0x3c0 [ext4] file_write_and_wait_range+0x5e/0xb0 __generic_file_fsync+0x22/0xb0 ext4_sync_file+0x1f7/0x3c0 [ext4] do_fsync+0x38/0x60 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). And I have positive feedback, this patch fixes the above problem.
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 1/28/21 10:18 AM, Chao Leng wrote: On 2021/1/28 15:58, Daniel Wagner wrote: On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); - ns != old; + ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". The problem is when we enter nvme_round_robin_path() and there is no path available. In this case the initialization ns = nvme_next_ns(head, old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old". It is impossible to return NULL for nvme_next_ns(head, old). No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL when then end of the list is reached. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/28 15:58, Daniel Wagner wrote: On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); -ns != old; +ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". The problem is when we enter nvme_round_robin_path() and there is no path available. In this case the initialization ns = nvme_next_ns(head, old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old". It is impossible to return NULL for nvme_next_ns(head, old). .
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: > > --- a/drivers/nvme/host/multipath.c > > +++ b/drivers/nvme/host/multipath.c > > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct > > nvme_ns_head *head, > > } > > for (ns = nvme_next_ns(head, old); > > -ns != old; > > +ns && ns != old; > nvme_round_robin_path just be called when !"old". > nvme_next_ns should not return NULL when !"old". > It seems unnecessary to add checking "ns". The problem is when we enter nvme_round_robin_path() and there is no path available. In this case the initialization ns = nvme_next_ns(head, old) could return a NULL pointer.
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/27 18:30, Daniel Wagner wrote: nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Cc: Hannes Reinecke Signed-off-by: Daniel Wagner --- v2: - moved NULL test into the if conditional statement - added Fixes tag drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9ac762b28811..282b7a4ea9a9 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); -ns != old; +ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". Is there a bug that "old" is not in "head" list? If yes, we should fix it. ns = nvme_next_ns(head, ns)) { if (nvme_path_is_disabled(ns)) continue;
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 2021/1/27 18:30, Daniel Wagner wrote: nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Cc: Hannes Reinecke Signed-off-by: Daniel Wagner --- v2: - moved NULL test into the if conditional statement - added Fixes tag drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9ac762b28811..282b7a4ea9a9 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); -ns != old; +ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". Is there a bug that "old" in not in "head" list? If yes, we should fix it. ns = nvme_next_ns(head, ns)) { if (nvme_path_is_disabled(ns)) continue;
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
Thanks, applied to nvme-5.11.
Re: [PATCH v2] nvme-multipath: Early exit if no path is available
On 1/27/21 11:30 AM, Daniel Wagner wrote: nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Cc: Hannes Reinecke Signed-off-by: Daniel Wagner --- v2: - moved NULL test into the if conditional statement - added Fixes tag drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9ac762b28811..282b7a4ea9a9 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); -ns != old; +ns && ns != old; ns = nvme_next_ns(head, ns)) { if (nvme_path_is_disabled(ns)) continue; Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
[PATCH v2] nvme-multipath: Early exit if no path is available
nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Cc: Hannes Reinecke Signed-off-by: Daniel Wagner --- v2: - moved NULL test into the if conditional statement - added Fixes tag drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9ac762b28811..282b7a4ea9a9 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); -ns != old; +ns && ns != old; ns = nvme_next_ns(head, ns)) { if (nvme_path_is_disabled(ns)) continue; -- 2.29.2