[
https://issues.apache.org/jira/browse/TIKA-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132728#comment-17132728
]
Tim Allison commented on TIKA-3110:
-----------------------------------
{{noformat}}
Caused by: java.io.IOException: tried to skip 7168 but actually skipped: 0
at org.apache.tika.io.TikaInputStream.skip(TikaInputStream.java:717)
at
org.apache.commons.io.input.ProxyInputStream.skip(ProxyInputStream.java:117)
at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:113)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.consumeRemainderOfLastBlock(TarArchiveInputStream.java:987)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getRecord(TarArchiveInputStream.java:487)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:360)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:799)
{{noformat}}
This is a regression (or new feature?) going from 1.24 -> 1.24.1.
For the sake of security, I changed TikaInputStream's skip() to require that
the given number of bytes actually be skipped. This prevents infinite loops in
parsers that forget to check and/or trust FileInputStream.skip() which no one
ever, ever should.
My sense was that there may be some mp4's out there that will cause problems
(e.g. they sometimes can end mid frame), and I'm now thinking we hit this
earlier with .tar files.
[~bodewig] would you or a colleague on commons-compress know if we should
expect this behavior for tar files...where they allege they have more data but
actually don't.
In short, is this something we should throw an exception for or should we
happily let the tar file allege it has more bytes than it does?
> cannot extract metadata from 7z .tar archive
> --------------------------------------------
>
> Key: TIKA-3110
> URL: https://issues.apache.org/jira/browse/TIKA-3110
> Project: Tika
> Issue Type: Bug
> Components: mime, parser
> Affects Versions: 1.24.1
> Reporter: Alex
> Priority: Major
>
> When I extracted metadata from .tar archive wich was created by linux bash
> it's works as I expect but if .tar archive was created by 7z I got an error:
> TikaException: TIKA-198: Illegal IOException from
> org.apache.tika.parser.pkg.PackageParser@4d0f2471
> I created a project on GitHub for your convenience. It includes 2 files and
> code for play around: [https://github.com/AlexOkayJ/apache-tika-tar-issue.git]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)