This is an automated email from the ASF dual-hosted git repository.
rzo1 pushed a change to branch 1597
in repository https://gitbox.apache.org/repos/asf/stormcrawler.git
discard 6b0cd735 First steps in migration from URL to URI. Leads to some
corner cases to discuss in normalizer / architecture of normalization.
add 2695a7df Bump aws.version from 1.12.795 to 1.12.796
add d44df67f Bump testcontainers.version from 1.21.3 to 1.21.4
add 9b4882e8 Minor: Regenerated License File for
d44df67f7376ad398525df142fe0c4ecdcf0b112
add 2a6930af #1542 - Migrate Documentation from Wiki to Living
Documentation in Code (#1714)
add 47a29da2 Fix #1763 - Documentation fixes from #1714 (#1765)
add a77acdab Fix #1761 - Add docker compose config to archetypes (#1764)
add 1c858556 Bump langchain4j.version from 1.9.1 to 1.10.0 (#1768)
add 23bc5113 Bump org.netpreserve:jwarc from 0.32.0 to 0.33.0 (#1767)
add 2adaa84e Minor: Regenerated License File for
23bc5113f566d7fc120ab0aa89da92b167790f56 (#1769)
add 64028dc4 Bump org.jsoup:jsoup from 1.21.2 to 1.22.1 (#1773)
add ba5d39f8 Bump aws.version from 1.12.796 to 1.12.797 (#1772)
add f082cf80 Bump org.codehaus.mojo:license-maven-plugin from 2.7.0 to
2.7.1 (#1771)
add 4fffc48f Minor: Regenerated License File for
f082cf804cf9992df70a78b8d58f643c0f82c685 (#1774)
add e222c7d0 #1611 - Refactor SQL module to use PreparedStatement in
SQLSpout and IndexerBolt (#1766)
add 190aa3cd Minor: Regenerated License File for
e222c7d04cf3255c119fd1a5e80510c77a39d73d (#1777)
add 856d72da JDBC driver should be optional and not supplied to be vendor
agnostic. (#1783)
add cfc37b79 Bump junit.version from 6.0.1 to 6.0.2 (#1779)
add f90345ac Bump com.puppycrawl.tools:checkstyle from 12.3.0 to 12.3.1
(#1780)
add 342fa11b Bump com.ibm.icu:icu4j from 78.1 to 78.2 (#1781)
add b12b7f53 Bump com.mysql:mysql-connector-j from 9.3.0 to 9.5.0 (#1778)
add df19b0c5 Minor: Regenerated License File for
b12b7f53eaadee090c0c3810e532020c6e88f05d (#1784)
add 632b4960 [maven-release-plugin] prepare release stormcrawler-3.5.1
add 95e7520e [maven-release-plugin] prepare for next development iteration
add f56c4821 Bump actions/checkout from 6.0.1 to 6.0.2
add ff555b0d Bump peter-evans/create-pull-request from 8.0.0 to 8.1.0
add d004d2e1 Bump actions/setup-java from 5.1.0 to 5.2.0
add 8be76c74 Bump org.netpreserve:jwarc from 0.33.0 to 0.34.0
add d12784d3 Bump selenium.version from 4.39.0 to 4.40.0
add 0dce0f46 Bump actions/cache from 5.0.1 to 5.0.2
add dad0bdf7 Minor: Regenerated License File for
d12784d3f583bd002f5aeb5f54892901a62b4f9f
add 38113645 Bump org.apache:apache from 35 to 37
add 0e401df3 fix: enable java 17 parsing for javadoc
add ddf076d1 #1775 add indentation rules, fix inconsistent files (#1776)
add 7e9505b7 link the docs instead of the obsolete wiki (#1795)
add 2c861ace Bump org.apache.solr:solr-solrj from 9.10.0 to 9.10.1 (#1796)
add 3baaf67a Minor: Regenerated License File for
2c861acef9e44e632999d16082fe69d26266fe98 (#1797)
add 1973ad79 Bump actions/cache from 5.0.2 to 5.0.3 (#1798)
add b6e77d73 Bump org.apache.maven.plugins:maven-compiler-plugin (#1800)
add 77dbbd47 Bump commons-codec:commons-codec from 1.20.0 to 1.21.0 (#1799)
add 9eef525e Bump com.microsoft.playwright:playwright from 1.57.0 to
1.58.0 (#1801)
add 1ad6f249 Bump com.mysql:mysql-connector-j from 9.5.0 to 9.6.0 (#1802)
add 55083925 Minor: Regenerated License File for
1ad6f249b6ff65b3eb5acfe73ce3c2ab04b29daf
add aa9f40ae Update Logo (#1805)
add 32eddd7a Bump langchain4j.version from 1.10.0 to 1.11.0
add 0d7fb580 Minor: Regenerated License File for
32eddd7a560e0b85da1b4e7714cbede4fc8d2d99
add c5337876 Bump junit.version from 6.0.2 to 6.0.3 (#1809)
add 96f3fb7d fix: use the local (runner) compose binary instead of a
container in tests (#1811)
new e9d0e404 First steps in migration from URL to URI. Leads to some
corner cases to discuss in normalizer / architecture of normalization.
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (6b0cd735)
\
N -- N -- N refs/heads/1597 (e9d0e404)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.asf.yaml | 3 +-
.editorconfig | 24 +
.github/workflows/main.yml | 10 +-
.github/workflows/maven.yml | 12 +-
.github/workflows/snapshots.yaml | 8 +-
.mvn/jvm.config | 2 +-
DISCLAIMER-BINARIES.txt | 9 +-
README.md | 4 +-
THIRD-PARTY.txt | 86 +-
archetype/pom.xml | 100 +-
.../META-INF/maven/archetype-metadata.xml | 68 +-
.../main/resources/archetype-resources/README.md | 8 +-
.../archetype-resources/crawler-conf.yaml | 18 +-
.../resources/archetype-resources/crawler.flux | 4 +-
.../archetype-resources/docker-compose.yml | 54 +
.../src/main/resources/archetype-resources/pom.xml | 252 ++--
.../main/resources/default-regex-normalizers.xml | 2 +-
assembly.xml | 128 +--
checkstyle.xml | 2 +-
core/pom.xml | 504 ++++----
core/src/main/resources/crawler-default.yaml | 36 +-
.../java/org/apache/stormcrawler/TestUtil.java | 38 +
core/src/test/resources/default-regex-filters.txt | 2 +-
core/src/test/resources/delegator-conf.yaml | 2 +-
core/src/test/resources/fast.urlfilter.json | 2 +-
core/src/test/resources/redir.html | 8 +-
.../stormcrawler.sitemap.extensions.all.xml | 154 +--
.../stormcrawler.sitemap.extensions.image.xml | 86 +-
.../stormcrawler.sitemap.extensions.links.xml | 62 +-
.../stormcrawler.sitemap.extensions.mobile.xml | 62 +-
.../stormcrawler.sitemap.extensions.news.xml | 84 +-
.../stormcrawler.sitemap.extensions.video.xml | 104 +-
.../test/resources/stormcrawler.sitemap.index.xml | 32 +-
core/src/test/resources/stormcrawler.sitemap.xml | 62 +-
core/src/test/resources/test.jsoupfilters.json | 2 +-
docs/pom.xml | 81 ++
.../tika-config.xml => docs/src/assembly/docs.xml | 36 +-
docs/src/main/asciidoc/architecture.adoc | 97 ++
docs/src/main/asciidoc/configuration.adoc | 316 +++++
docs/src/main/asciidoc/debugging.adoc | 21 +
docs/src/main/asciidoc/images/stormcrawler.drawio | 238 ++++
.../main/asciidoc/images/stormcrawler.drawio.jpg | Bin 0 -> 172493 bytes
.../main/asciidoc/images/stormcrawler.drawio.pdf | Bin 0 -> 50189 bytes
docs/src/main/asciidoc/index.adoc | 33 +
docs/src/main/asciidoc/internals.adoc | 443 +++++++
docs/src/main/asciidoc/overview.adoc | 53 +
docs/src/main/asciidoc/powered-by.adoc | 40 +
docs/src/main/asciidoc/presentations.adoc | 28 +
docs/src/main/asciidoc/quick-start.adoc | 211 ++++
external/ai/README.md | 2 +-
external/ai/ai-conf.yaml | 2 +-
external/ai/pom.xml | 4 +-
external/ai/src/main/resources/NOTICE | 2 +-
.../ai/src/main/resources/llm-default-prompt.txt | 2 +-
external/aws/pom.xml | 66 +-
external/langid/pom.xml | 46 +-
external/opensearch/archetype/pom.xml | 86 +-
.../META-INF/maven/archetype-metadata.xml | 94 +-
.../main/resources/archetype-resources/README.md | 14 +
.../archetype-resources/crawler-conf.yaml | 20 +-
.../resources/archetype-resources/crawler.flux | 2 +-
.../archetype-resources/dashboards/storm.ndjson | 2 +-
.../archetype-resources/docker-compose.yml | 81 ++
.../archetype-resources/opensearch-conf.yaml | 20 +-
.../src/main/resources/archetype-resources/pom.xml | 242 ++--
.../main/resources/default-regex-normalizers.xml | 2 +-
.../src/main/resources/indexer.mapping | 2 +-
.../src/main/resources/metrics.mapping | 2 +-
external/opensearch/opensearch-conf.yaml | 20 +-
external/opensearch/pom.xml | 184 +--
.../opensearch/src/test/resources/indexer.mapping | 2 +-
.../opensearch/src/test/resources/metrics.mapping | 2 +-
external/playwright/playwright-conf.yaml | 2 +-
external/playwright/pom.xml | 102 +-
external/pom.xml | 70 +-
external/selenium/pom.xml | 168 +--
external/solr/archetype/pom.xml | 2 +-
.../archetype-resources/clear-collections.sh | 2 +-
.../archetype-resources/crawler-conf.yaml | 20 +-
.../src/main/resources/archetype-resources/pom.xml | 242 ++--
.../main/resources/default-regex-normalizers.xml | 2 +-
external/solr/configsets/docs/conf/schema.xml | 42 +-
external/solr/configsets/docs/conf/solrconfig.xml | 40 +-
external/solr/configsets/metrics/conf/schema.xml | 26 +-
.../solr/configsets/metrics/conf/solrconfig.xml | 60 +-
external/solr/configsets/status/conf/schema.xml | 28 +-
.../solr/configsets/status/conf/solrconfig.xml | 38 +-
external/solr/pom.xml | 80 +-
.../stormcrawler/solr/SolrCloudContainerTest.java | 3 +-
external/sql/README.md | 10 +-
external/sql/pom.xml | 68 +-
external/sql/sql-conf.yaml | 14 +-
.../org/apache/stormcrawler/sql/IndexerBolt.java | 101 +-
.../java/org/apache/stormcrawler/sql/SQLSpout.java | 199 ++--
.../java/org/apache/stormcrawler/sql/SQLUtil.java | 21 +-
.../apache/stormcrawler/sql/StatusUpdaterBolt.java | 133 ++-
.../apache/stormcrawler/sql/AbstractSQLTest.java | 92 ++
.../apache/stormcrawler/sql/IndexerBoltTest.java | 311 +++++
.../org/apache/stormcrawler/sql/SQLSpoutTest.java | 237 ++++
.../stormcrawler/sql/StatusUpdaterBoltTest.java | 146 +++
external/tika/pom.xml | 102 +-
external/urlfrontier/pom.xml | 144 +--
external/warc/pom.xml | 138 +--
pom.xml | 1205 ++++++++++----------
104 files changed, 5551 insertions(+), 2827 deletions(-)
create mode 100644 .editorconfig
create mode 100644
archetype/src/main/resources/archetype-resources/docker-compose.yml
create mode 100644 docs/pom.xml
copy external/tika/src/main/resources/tika-config.xml =>
docs/src/assembly/docs.xml (50%)
create mode 100644 docs/src/main/asciidoc/architecture.adoc
create mode 100644 docs/src/main/asciidoc/configuration.adoc
create mode 100644 docs/src/main/asciidoc/debugging.adoc
create mode 100644 docs/src/main/asciidoc/images/stormcrawler.drawio
create mode 100644 docs/src/main/asciidoc/images/stormcrawler.drawio.jpg
create mode 100644 docs/src/main/asciidoc/images/stormcrawler.drawio.pdf
create mode 100644 docs/src/main/asciidoc/index.adoc
create mode 100644 docs/src/main/asciidoc/internals.adoc
create mode 100644 docs/src/main/asciidoc/overview.adoc
create mode 100644 docs/src/main/asciidoc/powered-by.adoc
create mode 100644 docs/src/main/asciidoc/presentations.adoc
create mode 100644 docs/src/main/asciidoc/quick-start.adoc
create mode 100644
external/opensearch/archetype/src/main/resources/archetype-resources/docker-compose.yml
create mode 100644
external/sql/src/test/java/org/apache/stormcrawler/sql/AbstractSQLTest.java
create mode 100644
external/sql/src/test/java/org/apache/stormcrawler/sql/IndexerBoltTest.java
create mode 100644
external/sql/src/test/java/org/apache/stormcrawler/sql/SQLSpoutTest.java
create mode 100644
external/sql/src/test/java/org/apache/stormcrawler/sql/StatusUpdaterBoltTest.java