+1 for the new option.

A fast fail for corrupted archive could help a lot.
Lee
On 6 5 2021, at 4:32, Gary Gregory <garydgreg...@gmail.com> wrote:
> In general, I think fail fast is ok with a clear exception message.
>
> Gary
> On Fri, Jun 4, 2021, 15:44 Stefan Bodewig <bode...@apache.org> wrote:
> > Hi all
> >
> > 7z archives provide CRCs for the metadata section so you can quickly
> > identify a wide range of broken archives - which is far better than what
> > you get for ZIP for example.
> >
> > It is possible to recover from a certain type of broken archive. A case
> > where the archive has been written almost completely and just the CRC
> > and the locator of metadata are missing. The docs talk about
> > disks/drives being removed prematurely.
> >
> > The basic idea is to search backwards from the end of the file for the
> > metadata and try to parse it. This is what SevenZFile does and has
> > always done. This is the root cause of
> > https://issues.apache.org/jira/browse/COMPRESS-542 - the file ends with
> > something that looks like metadata of an archive with lots and lots of
> > files in it and the allocation of arrays leads to a OOM.
> >
> > Current master will detect corrupt archives more quickly - in particular
> > without excessive allocations - but still it may take quite some time to
> > reject thousands of candidates of "this could be the first byte of
> > proper meta data". We are scanning the last megabyte of the file and
> > there is ample chance this last megabyte may contain random noise that
> > looks promising.
> >
> > Personally I believe that almost nobody actually needs this mode of
> > recovery.
> >
> > Therefore I've thought we might want to introduce an option that enables
> > the recovery mode. If it was disabled and we found the CRC was missing
> > we'd throw a new specific exception that says "you may want to try with
> > recovery enabled instead".
> >
> > Making this new option default to disabling recovery would break
> > backwards compatibility but it is tempting to think this could be
> > fine. I'm a bit torn here. What do you think?
> >
> >
> > Stefan
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>

Reply via email to