Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)
On Friday 07 of June 2019, Wols Lists wrote: > On 07/06/19 10:07, Luboš Luňák wrote: > >> For what its worth those sample documents are not "realworld" user > >> documents, but the output of fuzzing engines so any non-catastrophic > >> outcome is acceptable IMO > > > > I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ > > . Given that it's (hopefully) very unlikely to find real documents with > > broken zip internals, I find that good enough. > > Bear in mind I don't know the background to this ... > > My immediate reaction was "we can't refuse to let the user save their > document, so could we disable 'save' and do a 'save as'?". That doesn't make a difference here. The code can't save such a broken document, period. Regardless of where it is being saved to. > As for unlikely to find broken documents, it's too long ago for me to > remember the details, but I remember salvaging a broken calc document by > unzipping it and recovering the data portion. So real-world broken > documents do happen (although I think in this case it was broken such > that LO refused to open it ...) Manually unzipping and zipping back properly would work here too. Or you can improve the saving code to cope with such problems somehow, feel free to. -- Luboš Luňák l.lu...@collabora.com ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)
On 07/06/19 10:07, Luboš Luňák wrote: > On Thursday 06 of June 2019, Caolán McNamara wrote: >> On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote: >>> Any idea what to do about that? Is it really ok that we just refuse >>> to save it? Or should we save it even though the contents may be >>> broken? >> >> For what its worth those sample documents are not "realworld" user >> documents, but the output of fuzzing engines so any non-catastrophic >> outcome is acceptable IMO > > I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ . > Given that it's (hopefully) very unlikely to find real documents with broken > zip internals, I find that good enough. > Bear in mind I don't know the background to this ... My immediate reaction was "we can't refuse to let the user save their document, so could we disable 'save' and do a 'save as'?". As for unlikely to find broken documents, it's too long ago for me to remember the details, but I remember salvaging a broken calc document by unzipping it and recovering the data portion. So real-world broken documents do happen (although I think in this case it was broken such that LO refused to open it ...) Cheers, Wol ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)
On Thursday 06 of June 2019, Caolán McNamara wrote: > On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote: > > Any idea what to do about that? Is it really ok that we just refuse > > to save it? Or should we save it even though the contents may be > > broken? > > For what its worth those sample documents are not "realworld" user > documents, but the output of fuzzing engines so any non-catastrophic > outcome is acceptable IMO I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ . Given that it's (hopefully) very unlikely to find real documents with broken zip internals, I find that good enough. -- Luboš Luňák l.lu...@collabora.com ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)
On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote: > Any idea what to do about that? Is it really ok that we just refuse > to save it? Or should we save it even though the contents may be > broken? For what its worth those sample documents are not "realworld" user documents, but the output of fuzzing engines so any non-catastrophic outcome is acceptable IMO ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)
On Monday 03 of June 2019, Jan-Marek Glogowski wrote: > Am 03.06.19 um 21:52 schrieb Luboš Luňák: > > Ok, so it's not a problem with my code, my changes just happened to show > > the problem, and the problem is that those documents are broken. If you > > try to unzip the documents, it will complain about incorrect CRC ... > > Any idea what to do about that? Is it really ok that we just refuse to > > save it? Or should we save it even though the contents may be broken? > > IMHO the only sane solution would be to detect the broken CRCs on read and > report a broken file to the user. That's not so easy. We do not detect broken CRCs on load, because we load on demand. And removing that seems like a bad trade-off. Finding that a CRC stream has a broken CRCs means uncompressing everything and checking, even things we otherwise do not care about. I think we do not want to make loading of everything possibly slower just to detect that virtually all documents are correct. -- Luboš Luňák l.lu...@collabora.com ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)
Am 03.06.19 um 21:52 schrieb Luboš Luňák: > On Monday 03 of June 2019, Caolán McNamara wrote: >> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote: >>> On Thursday 30 of May 2019, Michael Meeks wrote: + some in the zip area - assuming they are threading related. >>> >>> Is this about those documents such >>> as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt >>> ? How >>> can I reproduce that problem? If I try to fetch >>> buildsl...@vm138.documentfoundation.org: >>> an/opendocument_stack_overflow_2.odt , >>> it doesn't exist. >> >> I attach two of the examples here. The input name was foo.sample, the >> output to odt name appears higher up in the bt during the export. >> >> ./instdir/program/soffice.bin --headless --convert-to odt >> opendocument_stack_overflow.sample > > > Ok, so it's not a problem with my code, my changes just happened to show the > problem, and the problem is that those documents are broken. If you try to > unzip the documents, it will complain about incorrect CRC (although it still > will uncompress them). And what happens is that when we try to save the file, > apparently only by that point we'll read those zip streams, there will be a > ZipException about that, and the code in package/ is not exception-safe. So > ZipOutputStream::writeLOC() gets called but not the matching > ZipOutputStream::rawCloseEntry(). > > But this is actually broken on several levels. If I make the code to catch > the exception better, I'll need to make it somehow handle the fact that > writeLOC() prepared for writing en entry, but then there's nothing to write. > But that's actually not important, since ZipPackageStream::saveChild() will > still return failure, so ZipPackageFolder::saveContents() will throw an > exception, making the whole document saving fail. Which in turn means this > whole save business is irrelevant, as there's just no way to save the > document, even though we can load it and we can edit it. Which seems rather > lame. > > Any idea what to do about that? Is it really ok that we just refuse to save > it? Or should we save it even though the contents may be broken? IMHO the only sane solution would be to detect the broken CRCs on read and report a broken file to the user. Eventually we could offer some recovery option: mark the broken CRCs to be recalculated and keep the stuff or drop the broken ZIP entries. I guess most users can't make this decision, so I would opt for optional recovery of the entries, ignoring the CRCs. Easier solution: we just deny / abort loading the file and tell the user babout the broken file. It really strange that the broken CRCs are just detected on write and the document loads without a problem. Or do we ignore all ZIP entries, which we don't know, which would be strange too? ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Saving documents with broken zip streams (Re: minutes of ESC call ...)
On Monday 03 of June 2019, Caolán McNamara wrote: > On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote: > > On Thursday 30 of May 2019, Michael Meeks wrote: > > > + some in the zip area - assuming they are threading related. > > > > Is this about those documents such > > as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt > > ? How > > can I reproduce that problem? If I try to fetch > > buildsl...@vm138.documentfoundation.org: > > an/opendocument_stack_overflow_2.odt , > > it doesn't exist. > > I attach two of the examples here. The input name was foo.sample, the > output to odt name appears higher up in the bt during the export. > > ./instdir/program/soffice.bin --headless --convert-to odt > opendocument_stack_overflow.sample Ok, so it's not a problem with my code, my changes just happened to show the problem, and the problem is that those documents are broken. If you try to unzip the documents, it will complain about incorrect CRC (although it still will uncompress them). And what happens is that when we try to save the file, apparently only by that point we'll read those zip streams, there will be a ZipException about that, and the code in package/ is not exception-safe. So ZipOutputStream::writeLOC() gets called but not the matching ZipOutputStream::rawCloseEntry(). But this is actually broken on several levels. If I make the code to catch the exception better, I'll need to make it somehow handle the fact that writeLOC() prepared for writing en entry, but then there's nothing to write. But that's actually not important, since ZipPackageStream::saveChild() will still return failure, so ZipPackageFolder::saveContents() will throw an exception, making the whole document saving fail. Which in turn means this whole save business is irrelevant, as there's just no way to save the document, even though we can load it and we can edit it. Which seems rather lame. Any idea what to do about that? Is it really ok that we just refuse to save it? Or should we save it even though the contents may be broken? -- Luboš Luňák l.lu...@collabora.com ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice