This is an automated email from the ASF dual-hosted git repository.

jnioche pushed a change to branch 851
in repository https://gitbox.apache.org/repos/asf/incubator-stormcrawler.git


    from 27414e98 Initial code for using Docker image for testing SOLR - WIP
     add 56fa9b7b nextFetchDate field in SOLR schema should be optional, fixes 
#1051
     add c4a3e090 OpenSearch 2.7.0 + renamed OpenSearchConnection (#1064)
     add 718892e3 (Re)separate injection from crawl topologies in *Search 
archetypes, fixes #1065
     add 8481aefa Remove injection from crawl topologies in *Search archetypes, 
fixes #1065
     add 0ddaaba1 BasicURLNormalizer .unmangleQueryString() returns invalid 
results if "&" symbol in a parents path #1059 (#1062)
     add 63f836a1 Removed remaining references to ES in OPenSearch module
     add 3876b539 Dependency upgrades.fixes #1066 (#1067)
     add acddeb2b Automatic creation of index definitions should use the bolt 
type (#1069)
     add 828bd42f Maven plugin upgrades + better handling of plugin versions
     add 2db5017d bgufix test jar not attached
     add a07f32d3 Update maven.yml
     add 9589e1b2 mechanism to retrieve more generic value of configuration  
(#1071)
     add 52ca5b23 Merge branch 'master' of 
github.com:DigitalPebble/storm-crawler
     add 7d972684 Batch requests in DeleterBolt, fixes #1072
     add 1bc37a78 Update README.md
     add 9920e6b1 Create DeletionBolt.java for Solr. #1050 (#1073)
     add d1d2d590 SOLR: suppress warnings + minor changes and Javadoc + added 
deletion to default topology
     add 6a15da1d Tika 2.8.0, fixes 1066
     add edba0d04 Increase the number of redirects to 5 for Robots.txt fetching 
(#1074)
     add f2b30cf4 Add test coverage reports with JaCoCo and Coveralls, fixes 
#1075
     add 92029b6f #1075 - Add test coverage reports with JaCoCo
     add bfbfddae #1075 - Update GH workflow to reduce log spam by adding -B 
and --no-transfer-progess maven options
     add fc36a105 Issue #1042: Adapt parsing of robots.txt files (#1055)
     add 91ae9778 Applied formatting
     add 487f1e30 Upgrades to XSoup 0.3.7, fixes #1082
     add d8188746 Test URL Filtering from the command line (#1081)
     add f2d29fdb CC 1.4, fix #1085
     add c6e5aa80 Minor - uppercase static field name to follow conventions
     add 90e52e33 Upgrade to Storm 2.5.0, fix #1089
     add 24803236 Tika 2.9.0, fixes #1090
     add b4bfebdc Pre-release 2.9
     add 0b282bbd [maven-release-plugin] prepare release 2.9
     add f7dfa823 [maven-release-plugin] prepare for next development iteration
     add 156f817c Selenium test (#1093)
     add 15711ad2 Dependency upgrades,fix #1094. moved managt of version for 
testcontainers to top level + various mvn plugins upgrades
     add a6455581 upgraded plugin dependencies in archetypes; fix #1094
     add 848166dd SQL StatusUpdaterBolt bugs, fix #1095
     add 8c7eac63 Add static utility class to URLPartitioner
     add bc21ebfa Trivial change to README in OpenSearch archetype
     add c630e614 Protocol util - add option to dump the content to a tmp file
     add 33696686 Activate sitemap discovery in archetypes; fix #1096
     add c7a5578a Make all protocol implementations testable on the command 
line, fix #1097
     add 51564508 Add OR operator for filter logic in DelegatorProtocol (and 
custom flag for robots) fix #1098
     add 7670cf1e Remove deprecated class DelegatorRemoteDriverProtocol,fix 
#1099
     add 114fd9e9 Turn off tracing in Selenium driver, fix #1100
     add 87b0eb19 refactoring timeouts Selenium (#1102)
     add ee01cbd3 Bug fix post 1102
     add d6f13776 Improvements and fixes to HttpRobotRulesParser when following 
redirects (#1103)
     add 18aae321 User agent substitution not handled correctly, fix #1109
     add 0623bdea Removed unused conf, fix #1099
     add e233c854 DelegatorProtocol to filter with regexps on URLs, fix #1110
     add c313b4fe Fetcher, set number of threads via metadata, fix #1111. 
Clarify variable for custom minCrawlDelay
     add 54620065 Fetcher: pass custom delay for queues via metadata, fix #1112
     add 2bd817e3 Pre 2.10 release
     add 58c29554 [maven-release-plugin] prepare release 2.10
     add 8406ce7b [maven-release-plugin] prepare for next development iteration
     add 7f8f8292 Fix README
     add 7092b62b Maven plugin updates
     add e9d0edee Applied formatting with new version of the plugin
     add cfe61d7c OS 211 (#1114)
     add ef31e509 Improve Selenium tests,fix #1115
     add 15562121 Use mock server for selenium tests, fix #1116 (#1119)
     add adb44fb4 pom cleanups; jwarc & wiremock dependency upgrades
     add f526e47f Dependency upgrades,fix #1118
     add 5e8802f6 Selenium tests: moved Jetty handling to abstract class so 
that it can be reused from other implementations
     add 4d3340fc Issue #728: Adding asterisk for metadata transfer (#1117)
     add 2eaa33dd Added missing license header to MetadataTest
     add 76a70ba8 AbstractIndexerBolt - avoid reparsing metadata keys for each 
document, fixes #1124
     add 0a8afbf5 WARCSpout loads inputs using HDFS (#1122)
     add 857bf09d Merge branch 'master' of 
github.com:DigitalPebble/storm-crawler
     add 3bd2d7ac Fix wrong most recent date was set (#1126)
     add ad706a4a Upgrade to Apache Storm 2.6.0, fix #1127
     add 00f319b0 FileSpout: spread the work based on the number of 
instances,fix #1125
     add ac4408c2 Add configurable delay between launching Fetch threads, fix 
#1128
     add 71cae464 SQL MetricsConsumer use Timestamps instead of dates
     add 87145c3a Add debug to protocolfactory to see which instance of a 
protocol got a URL
     add 642cf5fb Glob field mapping for indexer.md.mapping (#1130)
     add 012dace8 Archetypes to prompt user for user agent values,fix #1131
     add 5f83770b Remove default values for user agent,fix #1129
     add 5740f42e Utilize new SimpleRobotRulesParser API entry point,fix #1086
     add c3ae8c7d Fix flaky test in AdaptiveSchedulerTest.testSchedule,fix #1076
     add 6869b5ac Use versioned image for standalone-chrome in Selenium tests
     add babf4a72 archetypes: fix variable rewrite + httpagentversion won't 
have a default anymore; fixes #1131
     add dfe6d236 OpenSearch dashboard script work from anywhere, fix #1132
     add 7f70a47e Add committer statement (#1134)
     add 31a4b2ab Implement configurable getDocumentID in DeletionBolt (#1135)
     add d67ba6bc import Kibana script work from anywhere, fix #1136
     add 1ee61a44 Add two tests for SiteMapParserBolt (#1138)
     add b6ea3639 dependency upgrades (#1139)
     add 66086162 JSoup 1.17.2
     add 5392fc93 Release 2.11
     add ccd318c6 [maven-release-plugin] prepare release 2.11
     add 94f8bd2c [maven-release-plugin] prepare for next development iteration
     add 93747cd7 Handling of DateTimeParseException in WARCSpout (#1140)
     add a8d7419b Improve metrics for StatusUpdaterBolts,fix #1141
     add 41dd9100 Add sniffing for OpenSearch, fix #1142
     add 069f6850 Configure proxy with a single conf element + improve handling 
of blank values in SCProxy; improvement to CharsetIdentification
     add eb69c1e6 Create CODE_OF_CONDUCT.md
     add 701ef3c3 Update README.md
     add b1e0caa3 Merge branch 'master' of 
github.com:DigitalPebble/storm-crawler
     add 32eab34b Generate THIRD-PARTY.txt file, fixes #1145 (#1146)
     add 4b9a8a63 OpenSearch tests to use explicitly versioned Docker image, 
fixes #1147
     add 16da6526 bugfix - had forgotten to add the new file
     add 953a8c5f Remove coveralls maven plugin, fixes #1148 (#1149)
     add 2f80e7ab Dependency upgrades, fix #1144
     add 15d29d26 Removed dead link to screenshot of Kibana dash in ES module
     add fcc3b979 OpenSearch 2.12.0, fixes #1150
     add 7109685e Force version of commons-io to 2.11.0, fixes #1151
     add d404022a Partial revert of #1144 to keep Jackson in sync with Apache 
Storm
     add 56a646f2 Update third-party
     add 04e711db Add properties for missing third party libraries
     add 1b5c0384 OpenSearch - better handling of mappings  (#1155)
     add 8dee25ca Delete CODE_OF_CONDUCT.md (#1158)
     add 5a51efb3 Create DISCLAIMER (#1159)
     add bc8de236 Update NOTICE (#1160)
     add bdc34cbc Changed package names to org.apache + fixed references to 
DigitalPebble where possible (#1165)
     new d2ef5a1f Merge branch 'main' into 851
     new 08e8e76a Merge from main

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .github/workflows/code_coverage.yml                |  29 ++
 .github/workflows/maven.yml                        |   6 +-
 DISCLAIMER                                         |  10 +
 NOTICE                                             |   4 +-
 README.md                                          |  36 +-
 THIRD-PARTY.properties                             |   4 +
 THIRD-PARTY.txt                                    | 547 +++++++++++++++++++++
 archetype/pom.xml                                  |   6 +-
 .../META-INF/maven/archetype-metadata.xml          |  71 +--
 .../main/resources/archetype-resources/README.md   |   2 +-
 .../archetype-resources/crawler-conf.yaml          |  58 ++-
 .../resources/archetype-resources/crawler.flux     |  20 +-
 .../src/main/resources/archetype-resources/pom.xml |  24 +-
 .../src/main/java/CrawlTopology.java               |  24 +-
 .../src/main/resources/jsoupfilters.json           |   6 +-
 .../src/main/resources/parsefilters.json           |   8 +-
 .../src/main/resources/urlfilters.json             |  18 +-
 core/pom.xml                                       |  54 +-
 .../stormcrawler/protocol/Protocol.java            |  40 --
 .../selenium/DelegatorRemoteDriverProtocol.java    |  81 ---
 .../protocol/selenium/RemoteDriverProtocol.java    |  87 ----
 .../apache}/stormcrawler/ConfigurableTopology.java |   6 +-
 .../apache}/stormcrawler/Constants.java            |   2 +-
 .../apache}/stormcrawler/JSONResource.java         |   2 +-
 .../apache}/stormcrawler/Metadata.java             |  20 +-
 .../apache}/stormcrawler/bolt/FeedParserBolt.java  |  28 +-
 .../apache}/stormcrawler/bolt/FetcherBolt.java     | 108 ++--
 .../apache}/stormcrawler/bolt/JSoupParserBolt.java |  48 +-
 .../stormcrawler/bolt/SimpleFetcherBolt.java       |  45 +-
 .../stormcrawler/bolt/SiteMapParserBolt.java       |  28 +-
 .../stormcrawler/bolt/StatusEmitterBolt.java       |  26 +-
 .../apache}/stormcrawler/bolt/URLFilterBolt.java   |  12 +-
 .../stormcrawler/bolt/URLPartitionerBolt.java      |   8 +-
 .../apache}/stormcrawler/filtering/URLFilter.java  |   6 +-
 .../apache}/stormcrawler/filtering/URLFilters.java |  84 +++-
 .../filtering/basic/BasicURLFilter.java            |   6 +-
 .../filtering/basic/BasicURLNormalizer.java        |  23 +-
 .../filtering/basic/SelfURLFilter.java             |   6 +-
 .../filtering/depth/MaxDepthFilter.java            |   8 +-
 .../stormcrawler/filtering/host/HostURLFilter.java |   6 +-
 .../filtering/metadata/MetadataFilter.java         |   6 +-
 .../filtering/regex/FastURLFilter.java             |  10 +-
 .../stormcrawler/filtering/regex/RegexRule.java    |   2 +-
 .../filtering/regex/RegexURLFilter.java            |   2 +-
 .../filtering/regex/RegexURLFilterBase.java        |   6 +-
 .../filtering/regex/RegexURLNormalizer.java        |   6 +-
 .../filtering/robots/RobotsFilter.java             |  14 +-
 .../filtering/sitemap/SitemapFilter.java           |  10 +-
 .../stormcrawler/indexing/AbstractIndexerBolt.java | 143 ++++--
 .../stormcrawler/indexing/DummyIndexer.java        |   8 +-
 .../stormcrawler/indexing/StdOutIndexer.java       |   8 +-
 .../stormcrawler/jsoup/LDJsonParseFilter.java      |  12 +-
 .../stormcrawler/jsoup/LinkParseFilter.java        |  20 +-
 .../apache}/stormcrawler/jsoup/XPathFilter.java    |  12 +-
 .../parse/DocumentFragmentBuilder.java             |   2 +-
 .../apache}/stormcrawler/parse/JSoupFilter.java    |   7 +-
 .../apache}/stormcrawler/parse/JSoupFilters.java   |  10 +-
 .../apache}/stormcrawler/parse/Outlink.java        |   4 +-
 .../apache}/stormcrawler/parse/ParseData.java      |   4 +-
 .../apache}/stormcrawler/parse/ParseFilter.java    |   8 +-
 .../apache}/stormcrawler/parse/ParseFilters.java   |   8 +-
 .../apache}/stormcrawler/parse/ParseResult.java    |   4 +-
 .../apache}/stormcrawler/parse/TextExtractor.java  |   4 +-
 .../parse/filter/CollectionTagger.java             |  10 +-
 .../CommaSeparatedToMultivaluedMetadata.java       |   8 +-
 .../parse/filter/DebugParseFilter.java             |   6 +-
 .../parse/filter/DomainParseFilter.java            |  12 +-
 .../parse/filter/LDJsonParseFilter.java            |  10 +-
 .../stormcrawler/parse/filter/LinkParseFilter.java |  20 +-
 .../parse/filter/MD5SignatureParseFilter.java      |  10 +-
 .../parse/filter/MimeTypeNormalization.java        |   8 +-
 .../stormcrawler/parse/filter/XPathFilter.java     |  10 +-
 .../persistence/AbstractQueryingSpout.java         |   9 +-
 .../persistence/AbstractStatusUpdaterBolt.java     |  10 +-
 .../persistence/AdaptiveScheduler.java             |  22 +-
 .../stormcrawler/persistence/DefaultScheduler.java |  14 +-
 .../persistence/EmptyQueueListener.java            |   2 +-
 .../persistence/MemoryStatusUpdater.java           |   6 +-
 .../stormcrawler/persistence/Scheduler.java        |   8 +-
 .../apache}/stormcrawler/persistence/Status.java   |   2 +-
 .../persistence/StdOutStatusUpdater.java           |   4 +-
 .../persistence/urlbuffer/AbstractURLBuffer.java   |   8 +-
 .../persistence/urlbuffer/PriorityURLBuffer.java   |   4 +-
 .../persistence/urlbuffer/SchedulingURLBuffer.java |   4 +-
 .../persistence/urlbuffer/SimpleURLBuffer.java     |   2 +-
 .../persistence/urlbuffer/URLBuffer.java           |  12 +-
 .../protocol/AbstractHttpProtocol.java             | 118 +----
 .../stormcrawler/protocol/DelegatorProtocol.java   | 159 ++++--
 .../apache}/stormcrawler/protocol/HttpHeaders.java |   2 +-
 .../protocol/HttpRobotRulesParser.java             |  87 +++-
 .../org/apache/stormcrawler/protocol/Protocol.java | 156 ++++++
 .../stormcrawler/protocol/ProtocolFactory.java     |  17 +-
 .../stormcrawler/protocol/ProtocolResponse.java    |  10 +-
 .../apache}/stormcrawler/protocol/RobotRules.java  |   2 +-
 .../stormcrawler/protocol/RobotRulesParser.java    |  84 +++-
 .../stormcrawler/protocol/file/FileProtocol.java   |  16 +-
 .../stormcrawler/protocol/file/FileResponse.java   |   8 +-
 .../protocol/httpclient/HttpProtocol.java          |  25 +-
 .../protocol/okhttp/DNSResolutionListener.java     |   2 +-
 .../stormcrawler/protocol/okhttp/HttpProtocol.java |  24 +-
 .../protocol/selenium/NavigationFilter.java        |   8 +-
 .../protocol/selenium/NavigationFilters.java       |  14 +-
 .../protocol/selenium/RemoteDriverProtocol.java    | 131 +++++
 .../protocol/selenium/SeleniumProtocol.java        |  26 +-
 .../stormcrawler/proxy/MultiProxyManager.java      |   8 +-
 .../apache}/stormcrawler/proxy/ProxyManager.java   |   4 +-
 .../apache}/stormcrawler/proxy/SCProxy.java        |  14 +-
 .../stormcrawler/proxy/SingleProxyManager.java     |  13 +-
 .../apache}/stormcrawler/spout/FileSpout.java      |  32 +-
 .../apache}/stormcrawler/spout/MemorySpout.java    |  10 +-
 .../stormcrawler/util/AbstractConfigurable.java    |   2 +-
 .../stormcrawler/util/CharsetIdentification.java   |   6 +-
 .../stormcrawler/util/CollectionMetric.java        |   2 +-
 .../apache}/stormcrawler/util/ConfUtils.java       |  83 +++-
 .../apache}/stormcrawler/util/Configurable.java    |   2 +-
 .../stormcrawler/util/ConfigurableHelper.java      |   2 +-
 .../apache}/stormcrawler/util/CookieConverter.java |   2 +-
 .../stormcrawler/util/InitialisationUtil.java      |   2 +-
 .../stormcrawler/util/MetadataTransfer.java        |  30 +-
 .../stormcrawler/util/PerSecondReducer.java        |   2 +-
 .../apache}/stormcrawler/util/RefreshTag.java      |   2 +-
 .../apache}/stormcrawler/util/RobotsTags.java      |   4 +-
 .../apache}/stormcrawler/util/StringTabScheme.java |   4 +-
 .../apache}/stormcrawler/util/URLPartitioner.java  |  28 +-
 .../stormcrawler/util/URLStreamGrouping.java       |   8 +-
 .../apache}/stormcrawler/util/URLUtil.java         |   2 +-
 core/src/main/resources/crawler-default.yaml       |  97 +++-
 .../apache/stormcrawler/MetadataTest.java}         |  26 +-
 .../stormcrawler/TestMetadataSerialization.java    |   2 +-
 .../apache}/stormcrawler/TestOutputCollector.java  |   2 +-
 .../apache}/stormcrawler/TestUtil.java             |   2 +-
 .../stormcrawler/bolt/AbstractFetcherBoltTest.java |  12 +-
 .../stormcrawler/bolt/FeedParserBoltTest.java      |  16 +-
 .../apache}/stormcrawler/bolt/FetcherBoltTest.java |   2 +-
 .../stormcrawler/bolt/JSoupParserBoltTest.java     |  16 +-
 .../stormcrawler/bolt/SimpleFetcherBoltTest.java   |   2 +-
 .../stormcrawler/bolt/SiteMapParserBoltTest.java   |  95 ++--
 .../stormcrawler/filtering/BasicURLFilterTest.java |   6 +-
 .../filtering/BasicURLNormalizerTest.java          |  22 +-
 .../stormcrawler/filtering/FastURLFilterTest.java  |   6 +-
 .../stormcrawler/filtering/HostURLFilterTest.java  |   6 +-
 .../stormcrawler/filtering/MaxDepthFilterTest.java |   8 +-
 .../stormcrawler/filtering/MetadataFilterTest.java |   6 +-
 .../stormcrawler/filtering/RegexFilterTest.java    |   6 +-
 .../ClassInheritingFomAbstractAndInterface.java    |   6 +-
 .../ClassInheritingFromAbstractClassOnly.java      |   4 +-
 .../ClassInheritingFromOpenClass.java              |   4 +-
 .../ClassWithoutValidConstructor.java              |   4 +-
 .../initialisation/FinalClassToInitialize.java     |   2 +-
 .../helper/initialisation/SimpleOpenClass.java     |   2 +-
 .../helper/initialisation/base/AbstractClass.java  |   2 +-
 .../helper/initialisation/base/ITestInterface.java |   2 +-
 .../OpenClassWithAbstractClassAndInterface.java    |   2 +-
 .../stormcrawler/indexer/BasicIndexingTest.java    |  27 +-
 .../apache}/stormcrawler/indexer/DummyIndexer.java |   6 +-
 .../stormcrawler/indexer/IndexerTester.java        |  10 +-
 .../apache}/stormcrawler/json/JsoupFilterTest.java |  10 +-
 .../stormcrawler/jsoup/JSoupFiltersTest.java       |  10 +-
 .../stormcrawler/parse/DuplicateLinksTest.java     |  10 +-
 .../apache}/stormcrawler/parse/ParsingTester.java  |   8 +-
 .../stormcrawler/parse/StackOverflowTest.java      |   8 +-
 .../stormcrawler/parse/TextExtractorTest.java      |   2 +-
 .../parse/filter/CSVMetadataFilterTest.java        |   8 +-
 .../parse/filter/CollectionTaggerTest.java         |   4 +-
 .../parse/filter/SubDocumentsFilterTest.java       |   8 +-
 .../parse/filter/SubDocumentsParseFilter.java      |   8 +-
 .../stormcrawler/parse/filter/XPathFilterTest.java |   8 +-
 .../persistence/AdaptiveSchedulerTest.java         |  19 +-
 .../persistence/DefaultSchedulerTest.java          |   4 +-
 .../stormcrawler/persistence/URLBufferTest.java    |  10 +-
 .../protocol/AbstractProtocolTest.java             |  96 ++++
 .../protocol/DelegationProtocolTest.java           |  41 +-
 .../stormcrawler/protocol/DummyProtocol.java}      |  28 +-
 .../stormcrawler/protocol/HttpHeadersTest.java     |   2 +-
 .../protocol/HttpRobotRulesParserTest.java         | 282 +++++++++++
 .../protocol/selenium/ProtocolTest.java            | 166 +++++++
 .../stormcrawler/proxy/MultiProxyManagerTest.java  |   2 +-
 .../apache}/stormcrawler/proxy/SCProxyTest.java    |   2 +-
 .../stormcrawler/proxy/SingleProxyManagerTest.java |   2 +-
 .../apache/stormcrawler/util/ConfUtilsTest.java    |  64 +++
 .../stormcrawler/util/CookieConverterTest.java     |   2 +-
 .../stormcrawler/util/InitialisationUtilTest.java  |   6 +-
 .../stormcrawler/util/MetadataTransferTest.java    |  61 ++-
 .../apache}/stormcrawler/util/RefreshTagTest.java  |   2 +-
 .../apache}/stormcrawler/util/RobotsTagsTest.java  |   4 +-
 core/src/test/resources/basicurlnormalizer.json    |   4 +-
 core/src/test/resources/delegator-conf.yaml        |  21 +-
 core/src/test/resources/test.jsoupfilters.json     |   8 +-
 core/src/test/resources/test.parsefilters.json     |   8 +-
 core/src/test/resources/test.subdocfilter.json     |   6 +-
 .../test/resources/tripadvisor.sitemap.index.xml   |  22 +
 core/src/test/resources/tripadvisor.sitemap.xml.gz | Bin 0 -> 1537978 bytes
 external/aws/README.md                             |   2 +-
 external/aws/pom.xml                               |   8 +-
 .../aws/bolt/CloudSearchConstants.java             |   2 +-
 .../aws/bolt/CloudSearchIndexerBolt.java           |  12 +-
 .../stormcrawler/aws/bolt/CloudSearchUtils.java    |   2 +-
 .../stormcrawler/aws/s3/AbstractS3CacheBolt.java   |   4 +-
 .../stormcrawler/aws/s3/S3CacheChecker.java        |   6 +-
 .../apache}/stormcrawler/aws/s3/S3Cacher.java      |   6 +-
 .../stormcrawler/aws/s3/S3ContentCacher.java       |   4 +-
 external/elasticsearch/README.md                   |  20 +-
 external/elasticsearch/archetype/pom.xml           |   4 +-
 .../META-INF/maven/archetype-metadata.xml          |  35 +-
 .../main/resources/archetype-resources/README.md   |   6 +-
 .../archetype-resources/crawler-conf.yaml          |  58 ++-
 .../resources/archetype-resources/es-conf.yaml     |   2 +-
 .../resources/archetype-resources/es-crawler.flux  |  52 +-
 .../archetype-resources/es-injection.flux          |  50 ++
 .../archetype-resources/kibana/importKibana.sh     |   8 +-
 .../src/main/resources/archetype-resources/pom.xml |  24 +-
 .../src/main/java/ESCrawlTopology.java             |  36 +-
 .../src/main/resources/jsoupfilters.json           |   6 +-
 .../src/main/resources/parsefilters.json           |   8 +-
 .../src/main/resources/urlfilters.json             |  18 +-
 external/elasticsearch/pom.xml                     |   9 +-
 .../BulkItemResponseToFailedFlag.java              |   2 +-
 .../elasticsearch/ElasticSearchConnection.java     |   4 +-
 .../elasticsearch/bolt/DeletionBolt.java           |  23 +-
 .../elasticsearch/bolt/IndexerBolt.java            |  20 +-
 .../filtering/JSONURLFilterWrapper.java            |  14 +-
 .../elasticsearch/metrics/MetricsConsumer.java     |   8 +-
 .../elasticsearch/metrics/StatusMetricsBolt.java   |   6 +-
 .../parse/filter/JSONResourceWrapper.java          |  14 +-
 .../elasticsearch/persistence/AbstractSpout.java   |  10 +-
 .../persistence/AggregationSpout.java              |   8 +-
 .../elasticsearch/persistence/CollapsingSpout.java |   4 +-
 .../elasticsearch/persistence/HybridSpout.java     |   6 +-
 .../elasticsearch/persistence/ScrollSpout.java     |  10 +-
 .../persistence/StatusUpdaterBolt.java             |  50 +-
 .../elasticsearch/bolt/IndexerBoltTest.java        |  12 +-
 .../elasticsearch/bolt/StatusBoltTest.java         |  14 +-
 external/langid/pom.xml                            |   6 +-
 .../stormcrawler/parse/filter/LanguageID.java      |  12 +-
 external/opensearch/OS_IndexInit.sh                |  23 -
 external/opensearch/README.md                      |  19 +-
 external/opensearch/archetype/pom.xml              |   4 +-
 .../META-INF/archetype-post-generate.groovy        |   5 +-
 .../META-INF/maven/archetype-metadata.xml          |  37 +-
 .../resources/archetype-resources/OS_IndexInit.sh  |  25 +
 .../main/resources/archetype-resources/README.md   |  17 +-
 .../archetype-resources/crawler-conf.yaml          |  58 ++-
 .../resources/archetype-resources/crawler.flux     |  50 +-
 .../dashboards/importDashboards.sh                 |   8 +-
 .../resources/archetype-resources/injection.flux   |  50 ++
 .../archetype-resources/opensearch-conf.yaml       |  12 +-
 .../src/main/resources/archetype-resources/pom.xml |  24 +-
 .../src/main/resources/indexer.mapping}            |   0
 .../src/main/resources/jsoupfilters.json           |   6 +-
 .../src/main/resources/metrics.mapping             |   0
 .../src/main/resources/parsefilters.json           |   8 +-
 .../src/main/resources/status.mapping              |   0
 .../src/main/resources/urlfilters.json             |  18 +-
 external/opensearch/opensearch-conf.yaml           |  12 +-
 external/opensearch/pom.xml                        |  24 +-
 .../stormcrawler/opensearch/bolt/DeletionBolt.java |  94 ----
 .../opensearch/BulkItemResponseToFailedFlag.java   |  10 +-
 .../apache}/stormcrawler/opensearch/Constants.java |   2 +-
 .../stormcrawler/opensearch/IndexCreation.java     |  15 +-
 .../opensearch/OpenSearchConnection.java}          | 102 ++--
 .../stormcrawler/opensearch/bolt/DeletionBolt.java | 308 ++++++++++++
 .../stormcrawler/opensearch/bolt/IndexerBolt.java  |  59 +--
 .../opensearch/filtering/JSONURLFilterWrapper.java |  16 +-
 .../opensearch/metrics/MetricsConsumer.java        |  26 +-
 .../opensearch/metrics/StatusMetricsBolt.java      |  20 +-
 .../parse/filter/JSONResourceWrapper.java          |  38 +-
 .../opensearch/persistence/AbstractSpout.java      |  78 +--
 .../opensearch/persistence/AggregationSpout.java   |  18 +-
 .../opensearch/persistence/HybridSpout.java        |  22 +-
 .../opensearch/persistence/StatusUpdaterBolt.java  |  81 +--
 .../opensearch/bolt/AbstractOpenSearchTest.java    |  46 ++
 .../opensearch/bolt/IndexerBoltTest.java           |  30 +-
 .../opensearch/bolt/StatusBoltTest.java            |  38 +-
 .../resources/indexer.mapping}                     |   0
 .../src/{main => test}/resources/metrics.mapping   |   0
 .../src/test/resources/status.mapping              |   0
 external/pom.xml                                   |  23 +-
 external/solr/README.md                            |   2 +-
 external/solr/cores/status/conf/schema.xml         |   2 +-
 external/solr/pom.xml                              |  14 +-
 external/solr/solr-conf.yaml                       |   2 +-
 .../apache}/stormcrawler/solr/SeedInjector.java    |  10 +-
 .../apache}/stormcrawler/solr/SolrConnection.java  |   4 +-
 .../stormcrawler/solr/SolrCrawlTopology.java       |  26 +-
 .../stormcrawler/solr/bolt/DeletionBolt.java       |  86 ++++
 .../stormcrawler/solr/bolt/IndexerBolt.java        |  13 +-
 .../stormcrawler/solr/metrics/MetricsConsumer.java |   6 +-
 .../stormcrawler/solr/persistence/SolrSpout.java   |  11 +-
 .../solr/persistence/StatusUpdaterBolt.java        |  17 +-
 .../solr/persistence/StatusBoltTest.java           |  12 +-
 external/sql/pom.xml                               |   6 +-
 external/sql/sql-conf.yaml                         |   2 +-
 .../apache}/stormcrawler/sql/Constants.java        |   2 +-
 .../apache}/stormcrawler/sql/IndexerBolt.java      |  12 +-
 .../apache}/stormcrawler/sql/SQLSpout.java         |  10 +-
 .../apache}/stormcrawler/sql/SQLUtil.java          |   2 +-
 .../stormcrawler/sql/StatusUpdaterBolt.java        |  33 +-
 .../stormcrawler/sql/metrics/MetricsConsumer.java  |  18 +-
 external/tika/README.md                            |   4 +-
 external/tika/pom.xml                              |  12 +-
 .../apache}/stormcrawler/tika/DOMBuilder.java      |   2 +-
 .../apache}/stormcrawler/tika/ParserBolt.java      |  38 +-
 .../apache}/stormcrawler/tika/RedirectionBolt.java |   4 +-
 .../stormcrawler/tika/XMLCharacterRecognizer.java  |   2 +-
 .../apache}/stormcrawler/tika/ParserBoltTest.java  |  16 +-
 external/urlfrontier/README.md                     |   2 +-
 external/urlfrontier/pom.xml                       |   9 +-
 .../stormcrawler/urlfrontier/Constants.java        |   2 +-
 .../urlfrontier/ManagedChannelUtil.java            |   4 +-
 .../apache}/stormcrawler/urlfrontier/Spout.java    |  10 +-
 .../urlfrontier/StatusUpdaterBolt.java             |  14 +-
 .../urlfrontier/StatusUpdaterBoltTest.java         |  16 +-
 .../urlfrontier/URLFrontierContainer.java          |   2 +-
 .../urlfrontier/URLFrontierContainerConfig.java    |   2 +-
 external/warc/README.md                            |  43 +-
 external/warc/pom.xml                              |  20 +-
 .../warc/FileTimeSizeRotationPolicy.java           |   2 +-
 .../apache}/stormcrawler/warc/GzipHdfsBolt.java    |   2 +-
 .../stormcrawler/warc/WARCFileNameFormat.java      |   2 +-
 .../apache}/stormcrawler/warc/WARCHdfsBolt.java    |   6 +-
 .../stormcrawler/warc/WARCRecordFormat.java        |  20 +-
 .../stormcrawler/warc/WARCRequestRecordFormat.java |   8 +-
 .../apache}/stormcrawler/warc/WARCSpout.java       |  65 ++-
 .../stormcrawler/warc/WARCHdfsBoltTest.java        |  10 +-
 .../stormcrawler/warc/WARCRecordFormatTest.java    |   8 +-
 .../apache/stormcrawler/warc/WARCSpoutTest.java    |  70 +++
 external/warc/src/test/resources/test.warc.gz      | Bin 0 -> 301243 bytes
 .../src/test/resources/unparsable-date.warc.gz     | Bin 0 -> 938 bytes
 external/warc/src/test/resources/warc.inputs       |   2 +
 pom.xml                                            | 264 ++++++++--
 330 files changed, 5219 insertions(+), 2381 deletions(-)
 create mode 100644 .github/workflows/code_coverage.yml
 create mode 100644 DISCLAIMER
 create mode 100644 THIRD-PARTY.properties
 create mode 100644 THIRD-PARTY.txt
 delete mode 100644 
core/src/main/java/com/digitalpebble/stormcrawler/protocol/Protocol.java
 delete mode 100644 
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/DelegatorRemoteDriverProtocol.java
 delete mode 100644 
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/RemoteDriverProtocol.java
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/ConfigurableTopology.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/Constants.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/JSONResource.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/Metadata.java (92%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/FeedParserBolt.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/FetcherBolt.java (91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/JSoupParserBolt.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/SimpleFetcherBolt.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/SiteMapParserBolt.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/StatusEmitterBolt.java (83%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/URLFilterBolt.java (91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/URLPartitionerBolt.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/URLFilter.java (91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/URLFilters.java (60%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/basic/BasicURLFilter.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/basic/BasicURLNormalizer.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/basic/SelfURLFilter.java (90%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/depth/MaxDepthFilter.java (92%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/host/HostURLFilter.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/metadata/MetadataFilter.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/regex/FastURLFilter.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/regex/RegexRule.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/regex/RegexURLFilter.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/regex/RegexURLFilterBase.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/regex/RegexURLNormalizer.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/robots/RobotsFilter.java (86%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/sitemap/SitemapFilter.java (86%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/indexing/AbstractIndexerBolt.java (70%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/indexing/DummyIndexer.java (89%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/indexing/StdOutIndexer.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/jsoup/LDJsonParseFilter.java (91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/jsoup/LinkParseFilter.java (89%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/jsoup/XPathFilter.java (92%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/DocumentFragmentBuilder.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/JSoupFilter.java (87%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/JSoupFilters.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/Outlink.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/ParseData.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/ParseFilter.java (88%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/ParseFilters.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/ParseResult.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/TextExtractor.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/CollectionTagger.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/CommaSeparatedToMultivaluedMetadata.java 
(91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/DebugParseFilter.java (92%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/DomainParseFilter.java (86%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/LDJsonParseFilter.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/LinkParseFilter.java (89%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/MD5SignatureParseFilter.java (92%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/MimeTypeNormalization.java (91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/XPathFilter.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/AbstractQueryingSpout.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/AbstractStatusUpdaterBolt.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/AdaptiveScheduler.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/DefaultScheduler.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/EmptyQueueListener.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/MemoryStatusUpdater.java (90%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/Scheduler.java (90%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/Status.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/StdOutStatusUpdater.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/urlbuffer/AbstractURLBuffer.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/urlbuffer/PriorityURLBuffer.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/urlbuffer/SchedulingURLBuffer.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/urlbuffer/SimpleURLBuffer.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/urlbuffer/URLBuffer.java (90%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/AbstractHttpProtocol.java (59%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/DelegatorProtocol.java (54%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/HttpHeaders.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/HttpRobotRulesParser.java (71%)
 create mode 100644 
core/src/main/java/org/apache/stormcrawler/protocol/Protocol.java
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/ProtocolFactory.java (87%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/ProtocolResponse.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/RobotRules.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/RobotRulesParser.java (66%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/file/FileProtocol.java (78%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/file/FileResponse.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/httpclient/HttpProtocol.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/okhttp/DNSResolutionListener.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/okhttp/HttpProtocol.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/selenium/NavigationFilter.java (83%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/selenium/NavigationFilters.java (90%)
 create mode 100644 
core/src/main/java/org/apache/stormcrawler/protocol/selenium/RemoteDriverProtocol.java
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/selenium/SeleniumProtocol.java (78%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/MultiProxyManager.java (97%)
 copy core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/ProxyManager.java (91%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/SCProxy.java (93%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/SingleProxyManager.java (85%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/spout/FileSpout.java (88%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/spout/MemorySpout.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/AbstractConfigurable.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/CharsetIdentification.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/CollectionMetric.java (96%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/ConfUtils.java (56%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/Configurable.java (99%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/ConfigurableHelper.java (99%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/CookieConverter.java (99%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/InitialisationUtil.java (99%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/MetadataTransfer.java (87%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/PerSecondReducer.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/RefreshTag.java (97%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/RobotsTags.java (98%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/StringTabScheme.java (95%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/URLPartitioner.java (81%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/URLStreamGrouping.java (94%)
 rename core/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/URLUtil.java (99%)
 copy 
core/src/test/java/{com/digitalpebble/stormcrawler/protocol/HttpHeadersTest.java
 => org/apache/stormcrawler/MetadataTest.java} (57%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/TestMetadataSerialization.java (98%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/TestOutputCollector.java (98%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/TestUtil.java (99%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/AbstractFetcherBoltTest.java (92%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/FeedParserBoltTest.java (90%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/FetcherBoltTest.java (95%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/JSoupParserBoltTest.java (96%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/SimpleFetcherBoltTest.java (95%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/bolt/SiteMapParserBoltTest.java (85%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/BasicURLFilterTest.java (94%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/BasicURLNormalizerTest.java (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/FastURLFilterTest.java (94%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/HostURLFilterTest.java (96%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/MaxDepthFilterTest.java (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/MetadataFilterTest.java (94%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/filtering/RegexFilterTest.java (95%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/ClassInheritingFomAbstractAndInterface.java
 (80%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/ClassInheritingFromAbstractClassOnly.java
 (85%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/ClassInheritingFromOpenClass.java
 (84%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/ClassWithoutValidConstructor.java
 (86%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/FinalClassToInitialize.java (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/SimpleOpenClass.java (92%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/base/AbstractClass.java (94%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/base/ITestInterface.java (92%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/helper/initialisation/base/OpenClassWithAbstractClassAndInterface.java
 (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/indexer/BasicIndexingTest.java (89%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/indexer/DummyIndexer.java (94%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/indexer/IndexerTester.java (89%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/json/JsoupFilterTest.java (90%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/jsoup/JSoupFiltersTest.java (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/DuplicateLinksTest.java (87%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/ParsingTester.java (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/StackOverflowTest.java (91%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/TextExtractorTest.java (98%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/CSVMetadataFilterTest.java (88%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/CollectionTaggerTest.java (91%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/SubDocumentsFilterTest.java (87%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/SubDocumentsParseFilter.java (92%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/XPathFilterTest.java (93%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/AdaptiveSchedulerTest.java (92%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/DefaultSchedulerTest.java (97%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/persistence/URLBufferTest.java (90%)
 create mode 100644 
core/src/test/java/org/apache/stormcrawler/protocol/AbstractProtocolTest.java
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/DelegationProtocolTest.java (69%)
 rename 
core/src/{main/java/com/digitalpebble/stormcrawler/proxy/ProxyManager.java => 
test/java/org/apache/stormcrawler/protocol/DummyProtocol.java} (61%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/protocol/HttpHeadersTest.java (96%)
 create mode 100644 
core/src/test/java/org/apache/stormcrawler/protocol/HttpRobotRulesParserTest.java
 create mode 100644 
core/src/test/java/org/apache/stormcrawler/protocol/selenium/ProtocolTest.java
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/MultiProxyManagerTest.java (99%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/SCProxyTest.java (98%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/proxy/SingleProxyManagerTest.java (97%)
 create mode 100644 
core/src/test/java/org/apache/stormcrawler/util/ConfUtilsTest.java
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/CookieConverterTest.java (99%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/InitialisationUtilTest.java (97%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/MetadataTransferTest.java (50%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/RefreshTagTest.java (97%)
 rename core/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/util/RobotsTagsTest.java (95%)
 create mode 100644 core/src/test/resources/tripadvisor.sitemap.index.xml
 create mode 100644 core/src/test/resources/tripadvisor.sitemap.xml.gz
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/bolt/CloudSearchConstants.java (96%)
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/bolt/CloudSearchIndexerBolt.java (97%)
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/bolt/CloudSearchUtils.java (98%)
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/s3/AbstractS3CacheBolt.java (96%)
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/s3/S3CacheChecker.java (96%)
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/s3/S3Cacher.java (97%)
 rename external/aws/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/aws/s3/S3ContentCacher.java (94%)
 create mode 100644 
external/elasticsearch/archetype/src/main/resources/archetype-resources/es-injection.flux
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/BulkItemResponseToFailedFlag.java (98%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/ElasticSearchConnection.java (99%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/bolt/DeletionBolt.java (82%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/bolt/IndexerBolt.java (96%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/filtering/JSONURLFilterWrapper.java (93%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/metrics/MetricsConsumer.java (95%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/metrics/StatusMetricsBolt.java (96%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/parse/filter/JSONResourceWrapper.java 
(92%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/persistence/AbstractSpout.java (96%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/persistence/AggregationSpout.java (98%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/persistence/CollapsingSpout.java (98%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/persistence/HybridSpout.java (97%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/persistence/ScrollSpout.java (95%)
 rename external/elasticsearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/persistence/StatusUpdaterBolt.java (90%)
 rename external/elasticsearch/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/bolt/IndexerBoltTest.java (94%)
 rename external/elasticsearch/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/elasticsearch/bolt/StatusBoltTest.java (94%)
 rename external/langid/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/parse/filter/LanguageID.java (93%)
 delete mode 100755 external/opensearch/OS_IndexInit.sh
 create mode 100755 
external/opensearch/archetype/src/main/resources/archetype-resources/OS_IndexInit.sh
 create mode 100644 
external/opensearch/archetype/src/main/resources/archetype-resources/injection.flux
 copy external/opensearch/{src/main/resources/content.mapping => 
archetype/src/main/resources/archetype-resources/src/main/resources/indexer.mapping}
 (100%)
 copy external/opensearch/{ => 
archetype/src/main/resources/archetype-resources}/src/main/resources/metrics.mapping
 (100%)
 rename external/opensearch/{ => 
archetype/src/main/resources/archetype-resources}/src/main/resources/status.mapping
 (100%)
 delete mode 100644 
external/opensearch/src/main/java/com/digitalpebble/stormcrawler/opensearch/bolt/DeletionBolt.java
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/BulkItemResponseToFailedFlag.java (91%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/Constants.java (94%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/IndexCreation.java (89%)
 rename 
external/opensearch/src/main/java/{com/digitalpebble/stormcrawler/opensearch/OpensearchConnection.java
 => org/apache/stormcrawler/opensearch/OpenSearchConnection.java} (75%)
 create mode 100644 
external/opensearch/src/main/java/org/apache/stormcrawler/opensearch/bolt/DeletionBolt.java
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/bolt/IndexerBolt.java (90%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/filtering/JSONURLFilterWrapper.java (92%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/metrics/MetricsConsumer.java (87%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/metrics/StatusMetricsBolt.java (89%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/parse/filter/JSONResourceWrapper.java (82%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/persistence/AbstractSpout.java (75%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/persistence/AggregationSpout.java (96%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/persistence/HybridSpout.java (90%)
 rename external/opensearch/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/persistence/StatusUpdaterBolt.java (85%)
 create mode 100644 
external/opensearch/src/test/java/org/apache/stormcrawler/opensearch/bolt/AbstractOpenSearchTest.java
 rename external/opensearch/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/bolt/IndexerBoltTest.java (83%)
 rename external/opensearch/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/opensearch/bolt/StatusBoltTest.java (79%)
 rename external/opensearch/src/{main/resources/content.mapping => 
test/resources/indexer.mapping} (100%)
 rename external/opensearch/src/{main => test}/resources/metrics.mapping (100%)
 copy external/{elasticsearch => opensearch}/src/test/resources/status.mapping 
(100%)
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/SeedInjector.java (86%)
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/SolrConnection.java (97%)
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/SolrCrawlTopology.java (74%)
 create mode 100644 
external/solr/src/main/java/org/apache/stormcrawler/solr/bolt/DeletionBolt.java
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/bolt/IndexerBolt.java (92%)
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/metrics/MetricsConsumer.java (96%)
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/persistence/SolrSpout.java (96%)
 rename external/solr/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/persistence/StatusUpdaterBolt.java (88%)
 rename external/solr/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/solr/persistence/StatusBoltTest.java (93%)
 rename external/sql/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/sql/Constants.java (96%)
 rename external/sql/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/sql/IndexerBolt.java (94%)
 rename external/sql/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/sql/SQLSpout.java (96%)
 rename external/sql/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/sql/SQLUtil.java (97%)
 rename external/sql/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/sql/StatusUpdaterBolt.java (89%)
 rename external/sql/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/sql/metrics/MetricsConsumer.java (92%)
 rename external/tika/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/tika/DOMBuilder.java (99%)
 rename external/tika/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/tika/ParserBolt.java (93%)
 rename external/tika/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/tika/RedirectionBolt.java (97%)
 rename external/tika/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/tika/XMLCharacterRecognizer.java (98%)
 rename external/tika/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/tika/ParserBoltTest.java (90%)
 rename external/urlfrontier/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/Constants.java (97%)
 rename external/urlfrontier/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/ManagedChannelUtil.java (93%)
 rename external/urlfrontier/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/Spout.java (96%)
 rename external/urlfrontier/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/StatusUpdaterBolt.java (97%)
 rename external/urlfrontier/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/StatusUpdaterBoltTest.java (92%)
 rename external/urlfrontier/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/URLFrontierContainer.java (98%)
 rename external/urlfrontier/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/urlfrontier/URLFrontierContainerConfig.java (95%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/FileTimeSizeRotationPolicy.java (98%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/GzipHdfsBolt.java (99%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCFileNameFormat.java (98%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCHdfsBolt.java (95%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCRecordFormat.java (96%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCRequestRecordFormat.java (96%)
 rename external/warc/src/main/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCSpout.java (92%)
 rename external/warc/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCHdfsBoltTest.java (97%)
 rename external/warc/src/test/java/{com/digitalpebble => 
org/apache}/stormcrawler/warc/WARCRecordFormatTest.java (98%)
 create mode 100644 
external/warc/src/test/java/org/apache/stormcrawler/warc/WARCSpoutTest.java
 create mode 100644 external/warc/src/test/resources/test.warc.gz
 create mode 100644 external/warc/src/test/resources/unparsable-date.warc.gz
 create mode 100644 external/warc/src/test/resources/warc.inputs

Reply via email to