Hi everyone, Hash and building from source are ok.
However, when running a crawl with the single seed "https://apache.org/", I'm getting the following error from the JsoupParserBolt:
"Exception while guessing mimetype on https://apache.org/: org.apache.commons.compress.archivers.ArchiveException: No Archiver found for the stream signature"
This was not the case for stormcrawler-3.4.0. It seems to be caused by Tika's detector when we do MediaType mt = detector.detect(stream, metadata);
Markos On 9/6/25 11:52, Richard Zowalla wrote:
Hi folks, I have posted a first release candidate for the Apache StormCrawler 3.5.0 release and it is ready for testing. Apache StormCrawler 3.5.0 decouples Selenium from the core module, improving modularity and reducing unnecessary dependencies. The release also introduces an advanced metadata filtering systemt hat supports complex logical operations like key=>val OR (key2=>val2 AND key3=>val3). Additionally, multiple dependencies were upgraded, core tests improved, and deprecated code cleaned up, enhancing overall stability and maintainability. Thank you to everyone who contributed to this release, including all of our users and the people who submitted bug reports, contributed code or documentation enhancements. The release was made using the Apache StormCrawler release process, documented here: https://github.com/apache/stormcrawler/blob/main/RELEASING.md Source: https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1 <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.-RC1> Tag: https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 Commit Hash: 8d517ad6c6da32fc307106f8b0b9de4b6df48585 Maven Repo: https://repository.apache.org/content/repositories/orgapachestormcrawler-1009 <repositories> <repository> <id>stormcrawler-3.5.0-rc1</id> <name>Testing StormCrawler 3.5.0 release candidate</name> <url> https://repository.apache.org/content/repositories/orgapachestormcrawler-1009 </url> </repository> </repositories> Release notes: https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 Reminder: The up-2-date KEYS file for signature verification can be found here: https://downloads.apache.org/stormcrawler/KEYS Please vote on releasing these packages as Apache StormCrawler 3.5.0 The vote is open for at least the next 72 hours. Only votes from the StormCrawler PMC are binding, but everyone is welcome to check the release candidate and vote. The vote passes if at least three binding +1 votes are cast. Please VOTE [+1] go ship it [+0] meh, don't care [-1] stop, there is a ${showstopper} Thanks! Richard