Re: [COMPRESS] tar files and missing bytes?

2020-06-12 Thread Stefan Bodewig
On 2020-06-11, Tim Allison wrote:

>   We recently made TikaInputStream's skip() inherently strict so that it
> throws an EOF if a parser tries to skip past the end of a file.  We didn't
> notice any problems in our regression tests (aside from some likely
> truncated mp4s), but we recently got an issue [1] from a user where this is
> a problem for a tar file created by 7z [2].

>   Is this a valid tar, or are we right to throw an EOF?

Yes, it is, unfortunately. It somewhat depends on what you consider
"valid".

I saw the mail about the TIKA issue before I found this mail, see
https://issues.apache.org/jira/browse/TIKA-3110?focusedCommentId=17134328=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17134328

Stefan

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



[COMPRESS] tar files and missing bytes?

2020-06-11 Thread Tim Allison
All,
  We recently made TikaInputStream's skip() inherently strict so that it
throws an EOF if a parser tries to skip past the end of a file.  We didn't
notice any problems in our regression tests (aside from some likely
truncated mp4s), but we recently got an issue [1] from a user where this is
a problem for a tar file created by 7z [2].
  Is this a valid tar, or are we right to throw an EOF?

 Thank you.

   Best,

   Tim

[1] https://issues.apache.org/jira/browse/TIKA-3110
[2]
https://github.com/AlexOkayJ/apache-tika-tar-issue/blob/master/src/main/resources/7ztar.tar