This is an automated email from the ASF dual-hosted git repository.
rzo1 pushed a commit to branch asf-site
in repository
https://gitbox.apache.org/repos/asf/incubator-stormcrawler-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 491e135 Merge latest changes from "main"
491e135 is described below
commit 491e1354f4a73e69200021f62ff8b24a17efb226
Author: Richard Zowalla <[email protected]>
AuthorDate: Mon Apr 22 11:49:24 2024 +0200
Merge latest changes from "main"
---
faq/index.html | 47 +++++++++++------------------
feed.xml | 8 ++---
getting-started/index.html | 41 +++++++++----------------
img/incubator_feather_egg_logo_bw_crop.png | Bin 0 -> 56218 bytes
index.html | 37 ++++++++---------------
support/index.html | 39 ++++++++----------------
6 files changed, 60 insertions(+), 112 deletions(-)
diff --git a/faq/index.html b/faq/index.html
index 867d70c..85984df 100644
--- a/faq/index.html
+++ b/faq/index.html
@@ -6,13 +6,13 @@
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
- <title>Apache StormCrawler</title>
- <meta name="description" content="Apache StormCrawler is collection of
resources for building low-latency, scalable web crawlers on Apache Storm
+ <title>Apache StormCrawler (Incubating)</title>
+ <meta name="description" content="Apache StormCrawler (Incubating) is
collection of resources for building low-latency, scalable web crawlers on
Apache Storm
">
<link rel="stylesheet" href="/css/main.css">
<link rel="canonical" href="https://stormcrawler.apache.org/faq/">
- <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler"
href="https://stormcrawler.apache.org/feed.xml">
+ <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler
(Incubating)" href="https://stormcrawler.apache.org/feed.xml">
<link rel="icon" type="/image/png" href="/img/favicon.png" />
<script src="//fast.eager.io/lVxgbfnBHm.js"></script>
@@ -24,7 +24,7 @@
<header class="site-header">
<div class="site-header__wrap">
<div class="site-header__logo">
- <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler"></a>
+ <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler
(Incubating)"></a>
</div>
</div>
</header>
@@ -49,7 +49,7 @@
<p>A: Probably worth having a look at <a
href="http://storm.apache.org/">Apache Storm® first. The <a
href="http://storm.apache.org/releases/current/Tutorial.html">tutorial</a> and
<a href="http://storm.apache.org/documentation/Concepts.html">concept</a> pages
are good starting points.</p>
- <p><strong>Q: Do I need an Apache Storm® cluster to run
StormCrawler?</strong></p>
+ <p><strong>Q: Do I need an Apache Storm® cluster to run Apache StormCrawler
(Incubating)?</strong></p>
<p>A: No. It can run in local mode and will just use the Storm libraries as
dependencies. It makes sense to install Storm in pseudo-distributed mode though
so that you can use its UI to monitor the topologies.</p>
@@ -57,7 +57,7 @@
<p>A: Apache Storm® is an elegant framework, with simple concepts, which
provides a solid platform for distributed stream processing. It gives us fault
tolerance and guaranteed data processing out of the box. The project is also
very dynamic and backed by a thriving community. Last but not least it is under
ASF 2.0 license.</p>
- <p id="howfast"><strong>Q: How fast is StormCrawler?</strong></p>
+ <p id="howfast"><strong>Q: How fast is Apache StormCrawler
(Incubating)?</strong></p>
<p>A: This depends mainly on the diversity of hostnames as well as your
politeness settings. For instance, if you have 1M URLs from the same host and
have set a delay of 1 sec between request then the best you'll be able to do is
86400 pages per day. In practice this would be less than that as the time
needed for fetching the content (which itself depends on your network and how
large the documents are), parsing and indexing it etc... This is true of any
crawler, not just StormCrawler.</p>
@@ -66,16 +66,16 @@
<p>A: This <a
href="http://digitalpebble.blogspot.co.uk/2015/09/index-web-with-aws-cloudsearch.html">tutorial</a>
on using Apache Nutch® and SC for indexing with Cloudsearch give you some idea
of how they compare in their methodology and performance.
We also ran a comparative <a
href="http://digitalpebble.blogspot.co.uk/2017/01/the-battle-of-crawlers-apache-nutch-vs.html">benchmark</a>
on a larger crawl.</p>
<p>In a nutshell (pardon the pun), Nutch proceeds by batch steps where it
selects the URLs to fetch, fetches them, parses them then update it database
with the new info about the URLs it just processed and adds the newly
discovered URLs. The generate and update steps take longer and longer as the
crawl grows and the resources are used unevenly : when fetching there is little
CPU or disk used whereas when doing all the other activities, you are not
fetching anything at all, which is a w [...]
- <p>StormCrawler proceeds differently and does everything at the same time,
hence optimising the physical resources of the cluster, but can potentially
accomodate more use cases, e.g. when URLs naturally come as streams or when low
latency is a must. URLs also get indexed as they are fetched and not as a
batch. On a more subjective note and apart from being potentially more
efficient, StormCrawler is more modern, easier to understand and build, nicer
to use, more versatile and more acti [...]
- <p>Apache Nutch® is a great tool though, which we used for years with many
of our customers at DigitalPebble, and it can also do things that StormCrawler
cannot currently do out of the box like deduplicating or advanced scoring like
PageRank.</p>
+ <p>Apache StormCrawler (Incubating) proceeds differently and does everything
at the same time, hence optimising the physical resources of the cluster, but
can potentially accomodate more use cases, e.g. when URLs naturally come as
streams or when low latency is a must. URLs also get indexed as they are
fetched and not as a batch. On a more subjective note and apart from being
potentially more efficient, Apache StormCrawler (Incubating) is more modern,
easier to understand and build, ni [...]
+ <p>Apache Nutch® is a great tool though, which we used for years with many
of our customers at DigitalPebble, and it can also do things that Apache
StormCrawler (Incubating) cannot currently do out of the box like deduplicating
or advanced scoring like PageRank.</p>
<p><strong>Q: Do I need some sort of external storage? And if so, then
what?</strong></p>
<p>A: Yes, you'll need to store the URLs to fetch somewhere. The type of the
storage to use depends on the nature of your crawl. If your crawl is not
recursive i.e. you just want to process specific pages and/or won't discover
new pages through more than one path, then you could use messaging queues like
<a href="https://www.rabbitmq.com/">RabbitMQ</a>, <a
href="https://aws.amazon.com/sqs/">AWS SQS</a> or <a
href="http://kafka.apache.org">Apache Kafka®</a>. All you'll need is a Spout i
[...]
- <p>If your crawl is recursive and there is a possibility that URLs which are
already known are discovered multiple times, then a queue won't help as it
would add the same URL to the queue every time it is discovered. This would be
very inefficient. Instead you should use a storage where the keys are unique,
like for instance a relational database. StormCrawler has several resources you
can use in the <a
href="https://github.com/DigitalPebble/storm-crawler/tree/master/external">external
[...]
- <p>The advantage of using StormCrawler is that is it both modular and
flexible. You can plug it to pretty much any storage you want.</p>
+ <p>If your crawl is recursive and there is a possibility that URLs which are
already known are discovered multiple times, then a queue won't help as it
would add the same URL to the queue every time it is discovered. This would be
very inefficient. Instead you should use a storage where the keys are unique,
like for instance a relational database. Apache StormCrawler (Incubating) has
several resources you can use in the <a
href="https://github.com/DigitalPebble/storm-crawler/tree/maste [...]
+ <p>The advantage of using Apache StormCrawler (Incubating) is that is it
both modular and flexible. You can plug it to pretty much any storage you
want.</p>
- <p><strong>Q: Is StormCrawler polite?</strong></p>
+ <p><strong>Q: Is Apache StormCrawler (Incubating) polite?</strong></p>
<p>A: The <a href="http://www.robotstxt.org/">robots.txt</a> protocol is
supported and the fetchers are configured to have a <a
href="https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/resources/crawler-default.yaml#L6">delay</a>
between calls to the same hostname or domain. However like with every tool, it
is down to how people use it.</p>
<p><strong>Q: When do I know when a crawl is finished?</strong></p>
@@ -85,29 +85,16 @@
</main>
- <div class="github-info">
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=star&count=true"
frameborder="0" scrolling="0" width="105px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=watch&count=true&v=2"
frameborder="0" scrolling="0" width="110px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=fork&count=true"
frameborder="0" scrolling="0" width="101px" height="20px"></iframe>
-</div>
-
-<footer class="site-footer">
- © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a>
-<p>Licensed under the Apache License, Version 2.0. Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. All other marks mentioned may be trademarks or registered
trademarks of their respective owners.</p>
+ <footer class="site-footer">
+ <img src="img/incubator_feather_egg_logo_bw_crop.png" alt="Apache
Incubator Logo" width="500"><br/>
+ Apache StormCrawler is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is
required of all newly accepted projects until a further review indicates that
the infrastructure, communications, and decision making process have stabilized
in a manner consistent with other successful ASF projects. While incubation
status is not necessarily a reflection of the completeness or stability of the
code, it does indicate that the p [...]
+<br/> <br/>
+ © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a><br/><br/>
+Licensed under the Apache License, Version 2.0. <br/> Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. <br/> All other marks mentioned may be trademarks or registered
trademarks of their respective owners.
</footer>
- <script>
-
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
- (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
Date();a=s.createElement(o),
-
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
- ga('create', 'UA-71137732-1', 'auto');
- ga('send', 'pageview');
- </script>
-
</body>
</html>
diff --git a/feed.xml b/feed.xml
index ddc1229..f9615a6 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,13 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
- <title>Apache StormCrawler</title>
- <description>Apache StormCrawler is collection of resources for building
low-latency, scalable web crawlers on Apache Storm
+ <title>Apache StormCrawler (Incubating)</title>
+ <description>Apache StormCrawler (Incubating) is collection of resources
for building low-latency, scalable web crawlers on Apache Storm
</description>
<link>https://stormcrawler.apache.org/</link>
<atom:link href="https://stormcrawler.apache.org/feed.xml" rel="self"
type="application/rss+xml"/>
- <pubDate>Thu, 18 Apr 2024 04:12:59 -0500</pubDate>
- <lastBuildDate>Thu, 18 Apr 2024 04:12:59 -0500</lastBuildDate>
+ <pubDate>Mon, 22 Apr 2024 04:48:15 -0500</pubDate>
+ <lastBuildDate>Mon, 22 Apr 2024 04:48:15 -0500</lastBuildDate>
<generator>Jekyll v3.9.5</generator>
</channel>
diff --git a/getting-started/index.html b/getting-started/index.html
index 291be78..9926e97 100644
--- a/getting-started/index.html
+++ b/getting-started/index.html
@@ -6,13 +6,13 @@
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
- <title>Getting started with StormCrawler</title>
- <meta name="description" content="Apache StormCrawler is collection of
resources for building low-latency, scalable web crawlers on Apache Storm
+ <title>Getting started with Apache StormCrawler (Incubating)</title>
+ <meta name="description" content="Apache StormCrawler (Incubating) is
collection of resources for building low-latency, scalable web crawlers on
Apache Storm
">
<link rel="stylesheet" href="/css/main.css">
<link rel="canonical"
href="https://stormcrawler.apache.org/getting-started/">
- <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler"
href="https://stormcrawler.apache.org/feed.xml">
+ <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler
(Incubating)" href="https://stormcrawler.apache.org/feed.xml">
<link rel="icon" type="/image/png" href="/img/favicon.png" />
<script src="//fast.eager.io/lVxgbfnBHm.js"></script>
@@ -24,7 +24,7 @@
<header class="site-header">
<div class="site-header__wrap">
<div class="site-header__logo">
- <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler"></a>
+ <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler
(Incubating)"></a>
</div>
</div>
</header>
@@ -46,9 +46,9 @@
<h1>Quickstart</h1>
<br>
<p>NOTE: These instructions assume that you have <a
href="https://maven.apache.org/install.html">Apache Maven®</a> installed.
- You will also need to install <a href="https://storm.apache.org/">Apache
Storm®</a> to run the crawler. The version of Storm to use must match the one
defined in the pom.xml file of your topology. The major version of StormCrawler
mirrors the one from Apache Storm®, i.e whereas StormCrawler 1.x used Storm
1.2.3, the current version now requires Storm 2.6.0. Our <a
href="https://github.com/DigitalPebble/ansible-storm">Ansible-Storm</a>
repository contains resources to install Apache Sto [...]
+ You will also need to install <a href="https://storm.apache.org/">Apache
Storm®</a> to run the crawler. The version of Storm to use must match the one
defined in the pom.xml file of your topology. The major version of Apache
StormCrawler (Incubating) mirrors the one from Apache Storm®, i.e whereas
StormCrawler 1.x used Storm 1.2.3, the current version now requires Storm
2.6.0. Our <a
href="https://github.com/DigitalPebble/ansible-storm">Ansible-Storm</a>
repository contains resources t [...]
- <p>Once Apache Storm® is installed, the easiest way to get started is to
generate a brand new StormCrawler project using :</p>
+ <p>Once Apache Storm® is installed, the easiest way to get started is to
generate a brand new Apache StormCrawler (Incubating) project using:</p>
<p><i>mvn archetype:generate
-DarchetypeGroupId=com.digitalpebble.stormcrawler
-DarchetypeArtifactId=storm-crawler-archetype -DarchetypeVersion=2.11</i></p>
@@ -62,7 +62,7 @@
<p>What this CrawlTopology does is very simple : it gets URLs to crawl from a
<a href="https://urlfrontier.net">URLFrontier</a> instance and emits them on
the topology. These URLs are then partitioned by hostname to enfore the
politeness and then fetched. The next bolt (SiteMapParserBolt) checks whether
they are sitemap files and if not passes them on to a HTML parser. The parser
extracts the text from the document and passes it to a dummy indexer which
simply prints a representation of [...]
- <p>Of course this topology is very primitive and its purpose is merely to
give you an idea of how StormCrawler works. In reality you'd use a different
spout and index the documents to a proper backend. Look at the <a
href="https://github.com/DigitalPebble/storm-crawler/tree/master/external">external
modules</a> to see what's already available. Another limitation of this
topology is that it will work in local mode only or on a single worker.</p>
+ <p>Of course this topology is very primitive and its purpose is merely to
give you an idea of how Apache StormCrawler (Incubating) works. In reality,
you'd use a different spout and index the documents to a proper backend. Look
at the <a
href="https://github.com/DigitalPebble/storm-crawler/tree/master/external">external
modules</a> to see what's already available. Another limitation of this
topology is that it will work in local mode only or on a single worker.</p>
<p>You can run the topology in local mode with :</p>
@@ -74,7 +74,7 @@
<br>
- <p>If you want to use StormCrawler with Elasticsearch, the tutorial below
should be a good starting point.</p>
+ <p>If you want to use Apache StormCrawler (Incubating) with Elasticsearch,
the tutorial below should be a good starting point.</p>
<iframe width="840" height="472"
src="https://www.youtube.com/embed/8kpJLPdhvLw" frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen></iframe>
<br>
@@ -87,29 +87,16 @@
</main>
- <div class="github-info">
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=star&count=true"
frameborder="0" scrolling="0" width="105px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=watch&count=true&v=2"
frameborder="0" scrolling="0" width="110px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=fork&count=true"
frameborder="0" scrolling="0" width="101px" height="20px"></iframe>
-</div>
-
-<footer class="site-footer">
- © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a>
-<p>Licensed under the Apache License, Version 2.0. Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. All other marks mentioned may be trademarks or registered
trademarks of their respective owners.</p>
+ <footer class="site-footer">
+ <img src="img/incubator_feather_egg_logo_bw_crop.png" alt="Apache
Incubator Logo" width="500"><br/>
+ Apache StormCrawler is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is
required of all newly accepted projects until a further review indicates that
the infrastructure, communications, and decision making process have stabilized
in a manner consistent with other successful ASF projects. While incubation
status is not necessarily a reflection of the completeness or stability of the
code, it does indicate that the p [...]
+<br/> <br/>
+ © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a><br/><br/>
+Licensed under the Apache License, Version 2.0. <br/> Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. <br/> All other marks mentioned may be trademarks or registered
trademarks of their respective owners.
</footer>
- <script>
-
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
- (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
Date();a=s.createElement(o),
-
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
- ga('create', 'UA-71137732-1', 'auto');
- ga('send', 'pageview');
- </script>
-
</body>
</html>
diff --git a/img/incubator_feather_egg_logo_bw_crop.png
b/img/incubator_feather_egg_logo_bw_crop.png
new file mode 100644
index 0000000..377e4e3
Binary files /dev/null and b/img/incubator_feather_egg_logo_bw_crop.png differ
diff --git a/index.html b/index.html
index 117d416..2443ba5 100644
--- a/index.html
+++ b/index.html
@@ -6,13 +6,13 @@
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
- <title>Apache StormCrawler</title>
- <meta name="description" content="Apache StormCrawler is collection of
resources for building low-latency, scalable web crawlers on Apache Storm
+ <title>Apache StormCrawler (Incubating)</title>
+ <meta name="description" content="Apache StormCrawler (Incubating) is
collection of resources for building low-latency, scalable web crawlers on
Apache Storm
">
<link rel="stylesheet" href="/css/main.css">
<link rel="canonical" href="https://stormcrawler.apache.org/">
- <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler"
href="https://stormcrawler.apache.org/feed.xml">
+ <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler
(Incubating)" href="https://stormcrawler.apache.org/feed.xml">
<link rel="icon" type="/image/png" href="/img/favicon.png" />
<script src="//fast.eager.io/lVxgbfnBHm.js"></script>
@@ -24,7 +24,7 @@
<header class="site-header">
<div class="site-header__wrap">
<div class="site-header__logo">
- <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler"></a>
+ <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler
(Incubating)"></a>
</div>
</div>
</header>
@@ -48,7 +48,7 @@
</div>
<div class="row row-col">
<p><strong>Apache StormCrawler (Incubating)</strong> is an open source SDK
for building distributed web crawlers based on <a
href="http://storm.apache.org">Apache Storm®</a>. The project is under Apache
license v2 and consists of a collection of reusable resources and components,
written mostly in Java.</p>
- <p>The aim of StormCrawler is to help build web crawlers that are :</p>
+ <p>The aim of Apache StormCrawler (Incubating) is to help build web crawlers
that are :</p>
<ul>
<li>scalable</li>
<li>resilient</li>
@@ -58,7 +58,7 @@
</ul>
<p><strong>Apache StormCrawler (Incubating)</strong> is a library and
collection of resources that developers can leverage to build their own
crawlers. The good news is that doing so can be pretty straightforward! Have a
look at the <a href="getting-started/">Getting Started</a> section for more
details.</p>
<p>Apart from the core components, we provide some <a
href="https://github.com/apache/incubator-stormcrawler/tree/main/external">external
resources</a> that you can reuse in your project, like for instance our spout
and bolts for <a href="https://opensearch.org/">OpenSearch®</a> or a ParserBolt
which uses <a href="http://tika.apache.org">Apache Tika®</a> to parse various
document formats.</p>
- <p><strong>Apache StormCrawler</strong> is perfectly suited to use cases
where the URL to fetch and parse come as streams but is also an appropriate
solution for large scale recursive crawls, particularly where low latency is
required. The project is used in production by <a
href="https://github.com/apache/incubator-stormcrawler/wiki/Powered-By">many
organisations</a> and is actively developed and maintained.</p>
+ <p><strong>Apache StormCrawler (Incubating)</strong> is perfectly suited to
use cases where the URL to fetch and parse come as streams but is also an
appropriate solution for large scale recursive crawls, particularly where low
latency is required. The project is used in production by <a
href="https://github.com/apache/incubator-stormcrawler/wiki/Powered-By">many
organisations</a> and is actively developed and maintained.</p>
<p>The <a
href="https://github.com/apache/incubator-stormcrawler/wiki/Presentations">Presentations</a>
page contains links to some recent presentations made about this project.</p>
</div>
@@ -84,29 +84,16 @@
</main>
- <div class="github-info">
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=star&count=true"
frameborder="0" scrolling="0" width="105px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=watch&count=true&v=2"
frameborder="0" scrolling="0" width="110px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=fork&count=true"
frameborder="0" scrolling="0" width="101px" height="20px"></iframe>
-</div>
-
-<footer class="site-footer">
- © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a>
-<p>Licensed under the Apache License, Version 2.0. Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. All other marks mentioned may be trademarks or registered
trademarks of their respective owners.</p>
+ <footer class="site-footer">
+ <img src="img/incubator_feather_egg_logo_bw_crop.png" alt="Apache
Incubator Logo" width="500"><br/>
+ Apache StormCrawler is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is
required of all newly accepted projects until a further review indicates that
the infrastructure, communications, and decision making process have stabilized
in a manner consistent with other successful ASF projects. While incubation
status is not necessarily a reflection of the completeness or stability of the
code, it does indicate that the p [...]
+<br/> <br/>
+ © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a><br/><br/>
+Licensed under the Apache License, Version 2.0. <br/> Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. <br/> All other marks mentioned may be trademarks or registered
trademarks of their respective owners.
</footer>
- <script>
-
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
- (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
Date();a=s.createElement(o),
-
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
- ga('create', 'UA-71137732-1', 'auto');
- ga('send', 'pageview');
- </script>
-
</body>
</html>
diff --git a/support/index.html b/support/index.html
index 77de226..0f1a4c7 100644
--- a/support/index.html
+++ b/support/index.html
@@ -7,12 +7,12 @@
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Need assistance from web crawling experts?</title>
- <meta name="description" content="Apache StormCrawler is collection of
resources for building low-latency, scalable web crawlers on Apache Storm
+ <meta name="description" content="Apache StormCrawler (Incubating) is
collection of resources for building low-latency, scalable web crawlers on
Apache Storm
">
<link rel="stylesheet" href="/css/main.css">
<link rel="canonical" href="https://stormcrawler.apache.org/support/">
- <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler"
href="https://stormcrawler.apache.org/feed.xml">
+ <link rel="alternate" type="application/rss+xml" title="Apache StormCrawler
(Incubating)" href="https://stormcrawler.apache.org/feed.xml">
<link rel="icon" type="/image/png" href="/img/favicon.png" />
<script src="//fast.eager.io/lVxgbfnBHm.js"></script>
@@ -24,7 +24,7 @@
<header class="site-header">
<div class="site-header__wrap">
<div class="site-header__logo">
- <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler"></a>
+ <a href="/"><img src="/img/logo.png" alt="Apache StormCrawler
(Incubating)"></a>
</div>
</div>
</header>
@@ -45,16 +45,16 @@
<div class="row row-col">
<h1>Support</h1>
<br>
-<p>You can ask questions related to StormCrawler on Github in the <a
href="https://github.com/apache/incubator-stormcrawlerdiscussions">discussions
section</a>, on <a
href="http://stackoverflow.com/questions/tagged/stormcrawler">stackoverflow</a>
using the tag 'stormcrawler' or on <a
href="https://discord.com/invite/C62MHusNnG">Discord</a>.</p>
-<p>If you think you've found a bug, please <a
href="https://github.com/apache/incubator-stormcrawlerissues">open an issue</a>
on GitHub.</p>
+<p>You can ask questions related to Apache StormCrawler (Incubating) on Github
in the <a
href="https://github.com/apache/incubator-stormcrawler/discussions">discussions
section</a>, on <a
href="http://stackoverflow.com/questions/tagged/stormcrawler">stackoverflow</a>
using the tag 'stormcrawler' or on <a
href="https://discord.com/invite/C62MHusNnG">Discord</a>.</p>
+<p>If you think you've found a bug, please <a
href="https://github.com/apache/incubator-stormcrawler/issues">open an
issue</a> on GitHub.</p>
<h1>Commercial Support</h1>
<br>
- <p>The Apache StormCrawler PMC does not endorse or recommend any of the
products or services on this page. We love all our supporters equally.</p>
+ <p>The Apache StormCrawler (Incubating) PMC does not endorse or recommend
any of the products or services on this page. We love all our supporters
equally.</p>
<h2>Want to be added to this page? </h2>
<p>All submitted information must be factual and informational in nature and
not be a marketing statement. Statements that promote your products and
services over other offerings on the page will not be tolerated and will be
removed. Such marketing statements can be added to your own pages on your own
site.</p>
- <p>When in doubt, email the Apache StormCrawler PMC and ask. We are be happy
to help.</p>
+ <p>When in doubt, email the Apache StormCrawler (Incubating) PMC and ask. We
are be happy to help.</p>
<h2>Companies</h2>
<ul>
@@ -64,29 +64,16 @@
</main>
- <div class="github-info">
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=star&count=true"
frameborder="0" scrolling="0" width="105px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=watch&count=true&v=2"
frameborder="0" scrolling="0" width="110px" height="20px"></iframe>
- <iframe
src="https://ghbtns.com/github-btn.html?user=apache&repo=incubator-stormcrawler&type=fork&count=true"
frameborder="0" scrolling="0" width="101px" height="20px"></iframe>
-</div>
-
-<footer class="site-footer">
- © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a>
-<p>Licensed under the Apache License, Version 2.0. Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. All other marks mentioned may be trademarks or registered
trademarks of their respective owners.</p>
+ <footer class="site-footer">
+ <img src="img/incubator_feather_egg_logo_bw_crop.png" alt="Apache
Incubator Logo" width="500"><br/>
+ Apache StormCrawler is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is
required of all newly accepted projects until a further review indicates that
the infrastructure, communications, and decision making process have stabilized
in a manner consistent with other successful ASF projects. While incubation
status is not necessarily a reflection of the completeness or stability of the
code, it does indicate that the p [...]
+<br/> <br/>
+ © 2024 <a href="https://stormcrawler.apache.org/">The Apache
Software Foundation</a><br/><br/>
+Licensed under the Apache License, Version 2.0. <br/> Apache StormCrawler,
StormCrawler, the Apache feather logo are trademarks of The Apache Software
Foundation. <br/> All other marks mentioned may be trademarks or registered
trademarks of their respective owners.
</footer>
- <script>
-
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
- (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
Date();a=s.createElement(o),
-
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
- ga('create', 'UA-71137732-1', 'auto');
- ga('send', 'pageview');
- </script>
-
</body>
</html>