This is an automated email from the ASF dual-hosted git repository. rzo1 pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-stormcrawler.git
commit 54b769734a629d3094753f79ee00a96d3b61ede3 Author: Markos Volikas <[email protected]> AuthorDate: Wed Nov 13 17:12:50 2024 +0200 #1401 update readmes --- README.md | 2 +- archetype/src/main/resources/archetype-resources/README.md | 11 ++--------- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 1cf2345c..c8699555 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ This will not only create a fully formed project containing a POM with the depen Alternatively if you can't or don't want to use the Maven archetype above, you can simply copy the files from [archetype-resources](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/archetype-resources). -Have a look at the code of the [CrawlTopology class](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/src/main/java/CrawlTopology.java), the [crawler-conf.yaml](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler-conf.yaml) file as well as the files in [src/main/resources/](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/ [...] +Have a look at [crawler.flux](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler.flux), the [crawler-conf.yaml](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler-conf.yaml) file as well as the files in [src/main/resources/](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/archetype-resources/src/main/resources), th [...] ## Getting help diff --git a/archetype/src/main/resources/archetype-resources/README.md b/archetype/src/main/resources/archetype-resources/README.md index 72a4cfcf..e973f08f 100644 --- a/archetype/src/main/resources/archetype-resources/README.md +++ b/archetype/src/main/resources/archetype-resources/README.md @@ -34,15 +34,7 @@ where _seeds.txt_ is a file containing URLs to inject, with one URL per line. # Running the crawl -You can now submit the topology using the storm command: - -``` sh -storm local target/${artifactId}-${version}.jar --local-ttl 60 ${package}.CrawlTopology -- -conf crawler-conf.yaml -``` - -This will run the topology in local mode for 60 seconds. Simply use the 'storm jar' to start the topology in distributed mode, where it will run indefinitely. - -You can also use Flux to do the same: +You can now submit the flux topology using the storm command: ``` sh storm local target/${artifactId}-${version}.jar org.apache.storm.flux.Flux crawler.flux --local-ttl 3600 @@ -50,4 +42,5 @@ storm local target/${artifactId}-${version}.jar org.apache.storm.flux.Flux craw Note that in local mode, Flux uses a default TTL for the topology of 20 secs. The command above runs the topology for 1 hour. +Alternatively, you can use `storm jar` to start the topology in distributed mode, where it will run indefinitely. It is best to run the topology with `storm jar` to benefit from the Storm UI and logging. In that case, the topology runs continuously, as intended.
