Gregory Lepore created TIKA-4206:
------------------------------------

             Summary: Variation on Zip Bomb
                 Key: TIKA-4206
                 URL: https://issues.apache.org/jira/browse/TIKA-4206
             Project: Tika
          Issue Type: Bug
    Affects Versions: 3.0.0-BETA
            Reporter: Gregory Lepore
         Attachments: sample-42-mail-bomb.txt

I see Tika-216 which aims to prevent Zip bombs, but I'm seeing what looks like 
a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an email, 
which may be why it isn't throwing an error.

On my machine attempting to extract text (-J) the process continues infinitely 
(or at least 10 hours, which is when I stopped it).

The actual file is embedded in a .gz file inside of an ARC file. However, 
extracting the attached .txt file produces the same error.

 

The original ARC file is at: 
https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to