On 27.01.2021 12:33, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <[email protected]> >> Sent: 27 January 2021 11:21 >> To: [email protected] >> Cc: 'Paul Durrant' <[email protected]>; 'Konrad Rzeszutek Wilk' >> <[email protected]>; 'Roger Pau >> Monné' <[email protected]>; 'Jens Axboe' <[email protected]>; 'Dongli Zhang' >> <[email protected]>; [email protected]; >> [email protected]; xen- >> [email protected] >> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page >> rings >> >> On 27.01.2021 12:09, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Jan Beulich <[email protected]> >>>> Sent: 27 January 2021 10:57 >>>> To: Paul Durrant <[email protected]> >>>> Cc: Paul Durrant <[email protected]>; Konrad Rzeszutek Wilk >>>> <[email protected]>; Roger Pau >>>> Monné <[email protected]>; Jens Axboe <[email protected]>; Dongli Zhang >>>> <[email protected]>; >>>> [email protected]; [email protected]; >>>> [email protected] >>>> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page >>>> rings >>>> >>>> On 27.01.2021 11:30, Paul Durrant wrote: >>>>> From: Paul Durrant <[email protected]> >>>>> >>>>> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid >>>>> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the >>>>> behaviour of xen-blkback when connecting to a frontend was: >>>>> >>>>> - read 'ring-page-order' >>>>> - if not present then expect a single page ring specified by 'ring-ref' >>>>> - else expect a ring specified by 'ring-refX' where X is between 0 and >>>>> 1 << ring-page-order >>>>> >>>>> This was correct behaviour, but was broken by the afforementioned commit >>>>> to >>>>> become: >>>>> >>>>> - read 'ring-page-order' >>>>> - if not present then expect a single page ring >>>>> - expect a ring specified by 'ring-refX' where X is between 0 and >>>>> 1 << ring-page-order >>>>> - if that didn't work then see if there's a single page ring specified by >>>>> 'ring-ref' >>>>> >>>>> This incorrect behaviour works most of the time but fails when a frontend >>>>> that sets 'ring-page-order' is unloaded and replaced by one that does not >>>>> because, instead of reading 'ring-ref', xen-blkback will read the stale >>>>> 'ring-ref0' left around by the previous frontend will try to map the wrong >>>>> grant reference. >>>>> >>>>> This patch restores the original behaviour. >>>> >>>> Isn't this only the 2nd of a pair of fixes that's needed, the >>>> first being the drivers, upon being unloaded, to fully clean up >>>> after itself? Any stale key left may lead to confusion upon >>>> re-use of the containing directory. >>> >>> In a backend we shouldn't be relying on, nor really expect IMO, a frontend >>> to clean up after itself. >> Any backend should know *exactly* what xenstore nodes it’s looking for from >> a frontend. >> >> But the backend can't know whether a node exists because the present >> frontend has written it, or because an earlier instance forgot to >> delete it. It can only honor what's there. (In fact the other day I >> was wondering whether some of the writes of boolean "false" nodes >> wouldn't better be xenbus_rm() instead.) > > In the particular case this patch is fixing for me, the frontends are the > Windows XENVBD driver and the Windows crash version of the same driver > (actually built from different code). The 'normal' instance is multi-page > aware and the crash instance is not quite, i.e. it uses the old ring-ref but > knows to clean up 'ring-page-order'. > Clearly, in a crash situation, we cannot rely on frontend to clean up
Ah, I see (and agree). > so what you say does highlight that there indeed needs to be a second patch > to xen-blkback to make sure it removes 'ring-page-order' itself as 'state' > cycles through Closed and back to InitWait. And not just this one node then, I suppose? > I think this patch does still stand on its own though. Perhaps, yes. Jan

