Daniel P. Berrangé <berra...@redhat.com> wrote: > On Mon, Apr 04, 2022 at 12:20:14PM +0100, Dr. David Alan Gilbert wrote: >> * Ilya Leoshkevich (i...@linux.ibm.com) wrote: >> > zlib_send_prepare() compresses pages of a running VM. zlib does not >> > make any thread-safety guarantees with respect to changing deflate() >> > input concurrently with deflate() [1]. >> > >> > One can observe problems due to this with the IBM zEnterprise Data >> > Compression accelerator capable zlib [2]. When the hardware >> > acceleration is enabled, migration/multifd/tcp/zlib test fails >> > intermittently [3] due to sliding window corruption. >> > >> > At the moment this problem occurs only with this accelerator, since >> > its architecture explicitly discourages concurrent accesses [4]: >> > >> > Page 26-57, "Other Conditions": >> > >> > As observed by this CPU, other CPUs, and channel >> > programs, references to the parameter block, first, >> > second, and third operands may be multiple-access >> > references, accesses to these storage locations are >> > not necessarily block-concurrent, and the sequence >> > of these accesses or references is undefined. >> > >> > Still, it might affect other platforms due to a future zlib update. >> > Therefore, copy the page being compressed into a private buffer before >> > passing it to zlib. >> >> While this might work around the problem; your explanation doesn't quite >> fit with the symptoms; or if they do, then you have a separate problem. >> >> The live migration code relies on the fact that the source is running >> and changing it's memory as the data is transmitted; however it also >> relies on the fact that if this happens the 'dirty' flag is set _after_ >> those changes causing another round of migration and retransmission of >> the (now stable) data. >> >> We don't expect the load of the data for the first page write to be >> correct, consistent etc - we just rely on the retransmission to be >> correct when the page is stable. >> >> If your compressor hardware is doing something undefined during the >> first case that's fine; as long as it works fine in the stable case >> where the data isn't changing. >> >> Adding the extra copy is going to slow everyone else dowmn; and since >> there's plenty of pthread lockingin those multifd I'm expecting them >> to get reasonably defined ordering and thus be safe from multi threading >> problems (please correct us if we've actually done something wrong in >> the locking there). >> >> IMHO your accelerator when called from a zlib call needs to behave >> the same as if it was the software implementation; i.e. if we've got >> pthread calls in there that are enforcing ordering then that should be >> fine; your accelerator implementation needs to add a barrier of some >> type or an internal copy, not penalise everyone else. > > It is reasonable to argue that QEMU is relying on undefined behaviour > when invoking zlib in this case, so it isn't clear that the accelerator > impl should be changed, rather than QEMU be changed to follow the zlib > API requirements.
It works on all the other cases. My vote if need taht is that we add a zlib-sync or similar method. zlib already means doing a copy, doing an extra copy will cost too much on my opinion. Once that we are here, is there such a requirement for zstd? In my testing, zstd was basically always better than zlib (no, I don't remember the details). Later, Juan.