[
https://issues.apache.org/jira/browse/TIKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4206:
----------------------------------
Description:
I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks like
a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an email,
which may be why it isn't throwing an error.
On my machine attempting to extract text (-J) the process continues infinitely
(or at least 10 hours, which is when I stopped it).
The actual file is embedded in a .gz file inside of an ARC file. However,
extracting the attached .txt file produces the same error.
The original ARC file is at:
[https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz]
was:
I see Tika-216 which aims to prevent Zip bombs, but I'm seeing what looks like
a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an email,
which may be why it isn't throwing an error.
On my machine attempting to extract text (-J) the process continues infinitely
(or at least 10 hours, which is when I stopped it).
The actual file is embedded in a .gz file inside of an ARC file. However,
extracting the attached .txt file produces the same error.
The original ARC file is at:
https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz
> Variation on Zip Bomb
> ---------------------
>
> Key: TIKA-4206
> URL: https://issues.apache.org/jira/browse/TIKA-4206
> Project: Tika
> Issue Type: Bug
> Affects Versions: 3.0.0-BETA
> Reporter: Gregory Lepore
> Priority: Major
> Attachments: sample-42-mail-bomb.txt
>
>
> I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks
> like a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an
> email, which may be why it isn't throwing an error.
> On my machine attempting to extract text (-J) the process continues
> infinitely (or at least 10 hours, which is when I stopped it).
> The actual file is embedded in a .gz file inside of an ARC file. However,
> extracting the attached .txt file produces the same error.
>
> The original ARC file is at:
> [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)