Hi, Thanks for testing. I think, that this actually warrants a release re-roll, because SC would be broken if metadata detection is enabled.
Can you open an issue for it? We should post pone the release until we fixed that. I remember that exception from another project. In the end, we should not use Tika's Detector but a TikaInputStream instead like that: try (TikaInputStream tis = TikaInputStream.get(data)) { final Metadata metadata = new Metadata(); metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, file.getFileName()); final MediaType mediaType = MimeTypes.getDefaultMimeTypes().detect(tis, metadata); Gruß Richard Am 7. September 2025 15:50:06 MESZ schrieb Markos Volikas <mvoli...@apache.org>: >Hi everyone, > >Hash and building from source are ok. > >However, when running a crawl with the single seed "https://apache.org/", I'm >getting the following error from the JsoupParserBolt: > >"Exception while guessing mimetype on https://apache.org/: >org.apache.commons.compress.archivers.ArchiveException: No Archiver found for >the stream signature" > >This was not the case for stormcrawler-3.4.0. It seems to be caused by Tika's >detector when we do MediaType mt = detector.detect(stream, metadata); > >Markos > >On 9/6/25 11:52, Richard Zowalla wrote: >> Hi folks, >> >> I have posted a first release candidate for the Apache StormCrawler 3.5.0 >> release and it is ready for testing. >> >> Apache StormCrawler 3.5.0 decouples Selenium from the core module, improving >> modularity and reducing unnecessary dependencies. >> The release also introduces an advanced metadata filtering systemt hat >> supports complex logical operations like key=>val OR (key2=>val2 AND >> key3=>val3). >> Additionally, multiple dependencies were upgraded, core tests improved, and >> deprecated code cleaned up, enhancing overall stability and maintainability. >> >> Thank you to everyone who contributed to this release, including all of our >> users and the people who submitted bug reports, >> contributed code or documentation enhancements. >> >> The release was made using the Apache StormCrawler release process, >> documented here: >> https://github.com/apache/stormcrawler/blob/main/RELEASING.md >> >> Source: >> >> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1 >> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.-RC1> >> >> Tag: >> >> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >> >> Commit Hash: >> >> 8d517ad6c6da32fc307106f8b0b9de4b6df48585 >> >> Maven Repo: >> >> https://repository.apache.org/content/repositories/orgapachestormcrawler-1009 >> >> <repositories> >> <repository> >> <id>stormcrawler-3.5.0-rc1</id> >> <name>Testing StormCrawler 3.5.0 release candidate</name> >> <url> >> https://repository.apache.org/content/repositories/orgapachestormcrawler-1009 >> </url> >> </repository> >> </repositories> >> >> Release notes: >> >> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >> >> Reminder: The up-2-date KEYS file for signature verification can be >> found here: https://downloads.apache.org/stormcrawler/KEYS >> >> Please vote on releasing these packages as Apache StormCrawler 3.5.0 >> The vote is open for at least the next 72 hours. >> >> Only votes from the StormCrawler PMC are binding, but everyone is welcome to >> check the release candidate and vote. >> The vote passes if at least three binding +1 votes are cast. >> >> Please VOTE >> >> [+1] go ship it >> [+0] meh, don't care >> [-1] stop, there is a ${showstopper} >> >> Thanks! >> Richard