Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-10 Thread Luboš Luňák
On Friday 07 of June 2019, Wols Lists wrote:
> On 07/06/19 10:07, Luboš Luňák wrote:
> >> For what its worth those sample documents are not "realworld" user
> >> documents, but the output of fuzzing engines so any non-catastrophic
> >> outcome is acceptable IMO
> >
> >  I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/
> > . Given that it's (hopefully) very unlikely to find real documents with
> > broken zip internals, I find that good enough.
>
> Bear in mind I don't know the background to this ...
>
> My immediate reaction was "we can't refuse to let the user save their
> document, so could we disable 'save' and do a 'save as'?".

 That doesn't make a difference here. The code can't save such a broken 
document, period. Regardless of where it is being saved to.

> As for unlikely to find broken documents, it's too long ago for me to
> remember the details, but I remember salvaging a broken calc document by
> unzipping it and recovering the data portion. So real-world broken
> documents do happen (although I think in this case it was broken such
> that LO refused to open it ...)

 Manually unzipping and zipping back properly would work here too. Or you can 
improve the saving code to cope with such problems somehow, feel free to.

-- 
 Luboš Luňák
 l.lu...@collabora.com
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-07 Thread Wols Lists
On 07/06/19 10:07, Luboš Luňák wrote:
> On Thursday 06 of June 2019, Caolán McNamara wrote:
>> On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote:
>>> Any idea what to do about that? Is it really ok that we just refuse
>>> to save it? Or should we save it even though the contents may be
>>> broken?
>>
>> For what its worth those sample documents are not "realworld" user
>> documents, but the output of fuzzing engines so any non-catastrophic
>> outcome is acceptable IMO
> 
>  I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ . 
> Given that it's (hopefully) very unlikely to find real documents with broken 
> zip internals, I find that good enough.
> 
Bear in mind I don't know the background to this ...

My immediate reaction was "we can't refuse to let the user save their
document, so could we disable 'save' and do a 'save as'?".

As for unlikely to find broken documents, it's too long ago for me to
remember the details, but I remember salvaging a broken calc document by
unzipping it and recovering the data portion. So real-world broken
documents do happen (although I think in this case it was broken such
that LO refused to open it ...)

Cheers,
Wol
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-07 Thread Luboš Luňák
On Thursday 06 of June 2019, Caolán McNamara wrote:
> On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote:
> > Any idea what to do about that? Is it really ok that we just refuse
> > to save it? Or should we save it even though the contents may be
> > broken?
>
> For what its worth those sample documents are not "realworld" user
> documents, but the output of fuzzing engines so any non-catastrophic
> outcome is acceptable IMO

 I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ . 
Given that it's (hopefully) very unlikely to find real documents with broken 
zip internals, I find that good enough.

-- 
 Luboš Luňák
 l.lu...@collabora.com
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-06 Thread Caolán McNamara
On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote:
> Any idea what to do about that? Is it really ok that we just refuse
> to save it? Or should we save it even though the contents may be
> broken?

For what its worth those sample documents are not "realworld" user
documents, but the output of fuzzing engines so any non-catastrophic
outcome is acceptable IMO

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-04 Thread Luboš Luňák
On Monday 03 of June 2019, Jan-Marek Glogowski wrote:
> Am 03.06.19 um 21:52 schrieb Luboš Luňák:
> >  Ok, so it's not a problem with my code, my changes just happened to show
> > the problem, and the problem is that those documents are broken. If you
> > try to unzip the documents, it will complain about incorrect CRC
...
> >  Any idea what to do about that? Is it really ok that we just refuse to
> > save it? Or should we save it even though the contents may be broken?
>
> IMHO the only sane solution would be to detect the broken CRCs on read and
> report a broken file to the user.

 That's not so easy. We do not detect broken CRCs on load, because we load on 
demand. And removing that seems like a bad trade-off. Finding that a CRC 
stream has a broken CRCs means uncompressing everything and checking, even 
things we otherwise do not care about. I think we do not want to make loading 
of everything possibly slower just to detect that virtually all documents are 
correct.

-- 
 Luboš Luňák
 l.lu...@collabora.com
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-03 Thread Jan-Marek Glogowski
Am 03.06.19 um 21:52 schrieb Luboš Luňák:
> On Monday 03 of June 2019, Caolán McNamara wrote:
>> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote:
>>> On Thursday 30 of May 2019, Michael Meeks wrote:
   + some in the zip area - assuming they are threading related.
>>>
>>>  Is this about those documents such
>>> as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt
>>> ? How
>>> can I reproduce that problem? If I try to fetch
>>> buildsl...@vm138.documentfoundation.org:
>>> an/opendocument_stack_overflow_2.odt ,
>>> it doesn't exist.
>>
>> I attach two of the examples here. The input name was foo.sample, the
>> output to odt name appears higher up in the bt during the export.
>>
>> ./instdir/program/soffice.bin --headless --convert-to odt
>> opendocument_stack_overflow.sample
> 
> 
>  Ok, so it's not a problem with my code, my changes just happened to show the 
> problem, and the problem is that those documents are broken. If you try to 
> unzip the documents, it will complain about incorrect CRC (although it still 
> will uncompress them). And what happens is that when we try to save the file, 
> apparently only by that point we'll read those zip streams, there will be a 
> ZipException about that, and the code in package/ is not exception-safe. So 
> ZipOutputStream::writeLOC() gets called but not the matching 
> ZipOutputStream::rawCloseEntry().
> 
>  But this is actually broken on several levels. If I make the code to catch 
> the exception better, I'll need to make it somehow handle the fact that 
> writeLOC() prepared for writing en entry, but then there's nothing to write. 
> But that's actually not important, since ZipPackageStream::saveChild() will 
> still return failure, so ZipPackageFolder::saveContents() will throw an 
> exception, making the whole document saving fail. Which in turn means this 
> whole save business is irrelevant, as there's just no way to save the 
> document, even though we can load it and we can edit it. Which seems rather 
> lame.
> 
>  Any idea what to do about that? Is it really ok that we just refuse to save 
> it? Or should we save it even though the contents may be broken?

IMHO the only sane solution would be to detect the broken CRCs on read and
report a broken file to the user. Eventually we could offer some recovery
option: mark the broken CRCs to be recalculated and keep the stuff or drop the
broken ZIP entries. I guess most users can't make this decision, so I would opt
for optional recovery of the entries, ignoring the CRCs.

Easier solution: we just deny / abort loading the file and tell the user babout
the broken file.

It really strange that the broken CRCs are just detected on write and the
document loads without a problem. Or do we ignore all ZIP entries, which we
don't know, which would be strange too?
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Saving documents with broken zip streams (Re: minutes of ESC call ...)

2019-06-03 Thread Luboš Luňák
On Monday 03 of June 2019, Caolán McNamara wrote:
> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote:
> > On Thursday 30 of May 2019, Michael Meeks wrote:
> > >   + some in the zip area - assuming they are threading related.
> >
> >  Is this about those documents such
> > as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt
> > ? How
> > can I reproduce that problem? If I try to fetch
> > buildsl...@vm138.documentfoundation.org:
> > an/opendocument_stack_overflow_2.odt ,
> > it doesn't exist.
>
> I attach two of the examples here. The input name was foo.sample, the
> output to odt name appears higher up in the bt during the export.
>
> ./instdir/program/soffice.bin --headless --convert-to odt
> opendocument_stack_overflow.sample


 Ok, so it's not a problem with my code, my changes just happened to show the 
problem, and the problem is that those documents are broken. If you try to 
unzip the documents, it will complain about incorrect CRC (although it still 
will uncompress them). And what happens is that when we try to save the file, 
apparently only by that point we'll read those zip streams, there will be a 
ZipException about that, and the code in package/ is not exception-safe. So 
ZipOutputStream::writeLOC() gets called but not the matching 
ZipOutputStream::rawCloseEntry().

 But this is actually broken on several levels. If I make the code to catch 
the exception better, I'll need to make it somehow handle the fact that 
writeLOC() prepared for writing en entry, but then there's nothing to write. 
But that's actually not important, since ZipPackageStream::saveChild() will 
still return failure, so ZipPackageFolder::saveContents() will throw an 
exception, making the whole document saving fail. Which in turn means this 
whole save business is irrelevant, as there's just no way to save the 
document, even though we can load it and we can edit it. Which seems rather 
lame.

 Any idea what to do about that? Is it really ok that we just refuse to save 
it? Or should we save it even though the contents may be broken?

-- 
 Luboš Luňák
 l.lu...@collabora.com
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice