Cleaned your local Maven repo before building the uber jar? Can you check your compress version?
Gruß Richard Am 11. September 2025 15:38:38 MESZ schrieb Markos Volikas <[email protected]>: >Hi all, > >I'm afraid I'm still getting: > >16:25:13.829 [Thread-46-parse-executor[6, 6]] INFO o.a.s.b.JSoupParserBolt - >Parsing : starting https://apache.org/ >16:25:13.848 [Thread-46-parse-executor[6, 6]] ERROR o.a.s.b.JSoupParserBolt - >Exception while guessing mimetype on https://apache.org/: >org.apache.commons.compress.archivers.ArchiveException: No Archiver found for >the stream signature > >I'm running in local mode with Storm 2.8.2 running on Ubuntu 24.04 (openjdk >17.0.16 2025-07-15). The database is Solr running in Docker although this >should be irrelevant. Maybe I'm doing something wrong? I have attached the >config I'm using in case you have any ideas. Sorry for the delay, but I just >found time to look into this again :-( > >Markos > >On 9/8/25 20:46, Richard Zowalla wrote: >> Hi folks, >> >> I have posted a 2nd release candidate for the Apache StormCrawler 3.5.0 >> release and it is ready for testing. The regression with Tika / Compress was >> fixed. >> >> Apache StormCrawler 3.5.0 decouples Selenium from the core module, improving >> modularity and reducing unnecessary dependencies. >> The release also introduces an advanced metadata filtering systemt hat >> supports complex logical operations like key=>val OR (key2=>val2 AND >> key3=>val3). >> Additionally, multiple dependencies were upgraded, core tests improved, and >> deprecated code cleaned up, enhancing overall stability and maintainability. >> >> Thank you to everyone who contributed to this release, including all of our >> users and the people who submitted bug reports, >> contributed code or documentation enhancements. >> >> The release was made using the Apache StormCrawler release process, >> documented here: >> https://github.com/apache/stormcrawler/blob/main/RELEASING.md >> >> Source: >> >> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC >> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1>2 >> >> Tag: >> >> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >> >> Commit Hash: >> >> 1947ad4c56ff5c5c90e093900a163e0ac3144bb6 >> >> Maven Repo: >> >> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011 >> >> <repositories> >> <repository> >> <id>stormcrawler-3.5.0-rc2</id> >> <name>Testing StormCrawler 3.5.0 release candidate 2</name> >> <url> >> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011 >> </url> >> </repository> >> </repositories> >> >> Release notes: >> >> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >> >> Reminder: The up-2-date KEYS file for signature verification can be >> found here: https://downloads.apache.org/stormcrawler/KEYS >> >> Please vote on releasing these packages as Apache StormCrawler 3.5.0 >> The vote is open for at least the next 72 hours. >> >> Only votes from the StormCrawler PMC are binding, but everyone is welcome to >> check the release candidate and vote. >> The vote passes if at least three binding +1 votes are cast. >> >> Please VOTE >> >> [+1] go ship it >> [+0] meh, don't care >> [-1] stop, there is a ${showstopper} >> >> Thanks! >> Richard
