Hi everyone,

Hash and building from source are ok.

However, when running a crawl with the single seed "https://apache.org/";, I'm getting the following error from the JsoupParserBolt:

"Exception while guessing mimetype on https://apache.org/: org.apache.commons.compress.archivers.ArchiveException: No Archiver found for the stream signature"

This was not the case for stormcrawler-3.4.0. It seems to be caused by Tika's detector when we do MediaType mt = detector.detect(stream, metadata);

Markos

On 9/6/25 11:52, Richard Zowalla wrote:
Hi folks,

I have posted a first release candidate for the Apache StormCrawler 3.5.0 
release and it is ready for testing.

Apache StormCrawler 3.5.0 decouples Selenium from the core module, improving 
modularity and reducing unnecessary dependencies.
The release also introduces an advanced metadata filtering systemt hat supports complex 
logical operations like key=>val OR (key2=>val2 AND key3=>val3).
Additionally, multiple dependencies were upgraded, core tests improved, and 
deprecated code cleaned up, enhancing overall stability and maintainability.

Thank you to everyone who contributed to this release, including all of our 
users and the people who submitted bug reports,
contributed code or documentation enhancements.

The release was made using the Apache StormCrawler release process, documented 
here:
https://github.com/apache/stormcrawler/blob/main/RELEASING.md

Source:

https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1 
<https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.-RC1>

Tag:

https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0

Commit Hash:

8d517ad6c6da32fc307106f8b0b9de4b6df48585

Maven Repo:

https://repository.apache.org/content/repositories/orgapachestormcrawler-1009

<repositories>
<repository>
<id>stormcrawler-3.5.0-rc1</id>
<name>Testing StormCrawler 3.5.0 release candidate</name>
<url>
https://repository.apache.org/content/repositories/orgapachestormcrawler-1009
</url>
</repository>
</repositories>

Release notes:

https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0

Reminder: The up-2-date KEYS file for signature verification can be
found here: https://downloads.apache.org/stormcrawler/KEYS

Please vote on releasing these packages as Apache StormCrawler 3.5.0
The vote is open for at least the next 72 hours.

Only votes from the StormCrawler PMC are binding, but everyone is welcome to 
check the release candidate and vote.
The vote passes if at least three binding +1 votes are cast.

Please VOTE

[+1] go ship it
[+0] meh, don't care
[-1] stop, there is a ${showstopper}

Thanks!
Richard

Reply via email to