GitHub user sebastian-nagel added a comment to the discussion: WARCHdfsBolt forwarding WARC file path to StatusUpdaterBolt
Could just check the filesystem for new files from time to time. This seems reasonable since WARC files usually hold several 10,000 records and, consequently, aren't finished too often. GitHub link: https://github.com/apache/stormcrawler/discussions/1566#discussioncomment-13495259 ---- This is an automatically sent email for dev@stormcrawler.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@stormcrawler.apache.org