Hi,

Thanks for testing. I think, that this actually warrants a release re-roll, 
because SC would be broken if metadata detection is enabled. 

Can you open an issue for it? We should post pone the release until we fixed 
that.

I remember that exception from another project. In the end, we should not use 
Tika's Detector but a TikaInputStream instead like that:

try (TikaInputStream tis = TikaInputStream.get(data)) { final Metadata metadata 
= new Metadata(); metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, 
file.getFileName()); final MediaType mediaType = 
MimeTypes.getDefaultMimeTypes().detect(tis, metadata); 


Gruß 
Richard 

Am 7. September 2025 15:50:06 MESZ schrieb Markos Volikas <mvoli...@apache.org>:
>Hi everyone,
>
>Hash and building from source are ok.
>
>However, when running a crawl with the single seed "https://apache.org/";, I'm 
>getting the following error from the JsoupParserBolt:
>
>"Exception while guessing mimetype on https://apache.org/: 
>org.apache.commons.compress.archivers.ArchiveException: No Archiver found for 
>the stream signature"
>
>This was not the case for stormcrawler-3.4.0. It seems to be caused by Tika's 
>detector when we do MediaType mt = detector.detect(stream, metadata);
>
>Markos
>
>On 9/6/25 11:52, Richard Zowalla wrote:
>> Hi folks,
>> 
>> I have posted a first release candidate for the Apache StormCrawler 3.5.0 
>> release and it is ready for testing.
>> 
>> Apache StormCrawler 3.5.0 decouples Selenium from the core module, improving 
>> modularity and reducing unnecessary dependencies.
>> The release also introduces an advanced metadata filtering systemt hat 
>> supports complex logical operations like key=>val OR (key2=>val2 AND 
>> key3=>val3).
>> Additionally, multiple dependencies were upgraded, core tests improved, and 
>> deprecated code cleaned up, enhancing overall stability and maintainability.
>> 
>> Thank you to everyone who contributed to this release, including all of our 
>> users and the people who submitted bug reports,
>> contributed code or documentation enhancements.
>> 
>> The release was made using the Apache StormCrawler release process, 
>> documented here:
>> https://github.com/apache/stormcrawler/blob/main/RELEASING.md
>> 
>> Source:
>> 
>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1 
>> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.-RC1>
>> 
>> Tag:
>> 
>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
>> 
>> Commit Hash:
>> 
>> 8d517ad6c6da32fc307106f8b0b9de4b6df48585
>> 
>> Maven Repo:
>> 
>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1009
>> 
>> <repositories>
>> <repository>
>> <id>stormcrawler-3.5.0-rc1</id>
>> <name>Testing StormCrawler 3.5.0 release candidate</name>
>> <url>
>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1009
>> </url>
>> </repository>
>> </repositories>
>> 
>> Release notes:
>> 
>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
>> 
>> Reminder: The up-2-date KEYS file for signature verification can be
>> found here: https://downloads.apache.org/stormcrawler/KEYS
>> 
>> Please vote on releasing these packages as Apache StormCrawler 3.5.0
>> The vote is open for at least the next 72 hours.
>> 
>> Only votes from the StormCrawler PMC are binding, but everyone is welcome to 
>> check the release candidate and vote.
>> The vote passes if at least three binding +1 votes are cast.
>> 
>> Please VOTE
>> 
>> [+1] go ship it
>> [+0] meh, don't care
>> [-1] stop, there is a ${showstopper}
>> 
>> Thanks!
>> Richard

Reply via email to