This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to annotated tag release-1.22
in repository https://gitbox.apache.org/repos/asf/nutch.git
*** WARNING: tag release-1.22 was modified! ***
from cc4d2150e (tag)
to 6a4ec040a (tag)
tagging a4c9cc47285975d5ab08674425d2ce44a8556bc9 (commit)
replaces release-1.13
by Sebastian Nagel
on Thu Feb 12 14:07:30 2026 +0100
- Log -----------------------------------------------------------------
Apache Nutch 1.22 RC#1 Tag
-----------------------------------------------------------------------
omit bf8820348 NUTCH-3118 Logging pattern missing one argument placeholder
omit cb2f47015 NUTCH-3118 Logging pattern missing one argument placeholder
omit 65eb8857d Nutch 1.21 release - update current year in API docs etc. -
update version number - update changes / release notes
add 2786b5a9b NUTCH-3118 Logging pattern missing one argument placeholder
(#857)
add 11e9a6a3e Prepare for new development after release of 1.21 - bump
version number -> 1.22-SNAPSHOT - update changelog - update year
add d1b70adc7 NUTCH-3119 Log4j package scanning is deprecated
add 8416da8a1 NUTCH-3118 Logging pattern missing one argument placeholder
add 7e43e12b2 NUTCH-3124 Github workflow not run because of uncertified
action "paths-changes-filter"
add 5ae91b69e [NUTCH-3122] Make SpellCheckedMetadata case-insensitive for
all Metadata names
add 365f58530 [NUTCH-3122] Add test for backward compatibility of
SpellCheckedMetadata
add 3991c5b98 Merge pull request #859 from TamimEhsan/NUTCH-3122
add 4c04a9847 NUTCH-2887 Migrate to JUnit 5 Jupiter (#861)
add e2b60fc00 NUTCH-2887 Migrate to JUnit 5 Jupiter (#862)
add cfcf2d761 NUTCH-2887 Migrate to JUnit 5 Jupiter
add 919e24515 NUTCH-2887 Migrate to JUnit 5 Jupiter
add a966c44d3 NUTCH-2887 Migrate to JUnit 5 Jupiter
add 667e21764 Merge pull request #864 from
sebastian-nagel/NUTCH-2887-junit4-mrunit
add 2d92366a5 NUTCH-3126 Report JUnit test results in GitHub pull request
thread (#863)
add cefb48a75 NUTCH-3099 Allow wildcard '*' in http.proxy.exception.list
(via Isabelle Giguere) (#865)
add 317d2de28 NUTCH-3126 Report JUnit test results in GitHub pull request
thread (#867)
add 1156801bc NUTCH-3040 Upgrade to Hadoop 3.4.2 (#866)
add f43ff78bf fix for NUTCH-2671 contributed by igiguere. Also fixes
NUTCH-3128, NUTCH-3125
add f65371d1a Merge pull request #870 from igiguere/NUTCH-2971
add 7b5ed23a5 NUTCH-3126 Report JUnit test results in GitHub pull request
thread (#868)
add f71bab402 NUTCH-3132 Standardize existing Nutch metrics naming and
implementation (#871)
add ca2591e17 NUTCH-3134 Add latency metrics with percentile support to
Fetcher, Parser, and Indexer (#876)
add de27acc67 NUTCH-3133 Upgrade GitHub workflows to JDK 17
add 8307b6b81 NUTCH-3135 Cache downloaded ant-eclipse.jar
add 50b1ee639 NUTCH-3136 Upgrade crawler-commons dependency
add c7cf56964 NUTCH-3136 Upgrade crawler-commons dependency
add 8a0fb2b26 NUTCH-3137 Upgrade Nutch core dependencies (#875)
add 00bf8c463 NUTCH-3139 protocol-okhttp: add support for zstd
content-encoding - upgrade to OkHttp 5.3.2 - enable support for zstd
content-encoding
add 66f678e62 NUTCH-3141 Cache Hadoop Counter References in Hot Paths
(#878)
add ec8747a3f NUTCH-3143 GitHub workflow does not run all unit tests (#884)
add 8e7bbc416 NUTCH-3143 GitHub workflow does not run all unit tests (#885)
add ddabe9694 NUTCH-3144 URLUtil unit tests fail after upgrade to
crawler-commons 1.6
add d5dccfb0c NUTCH-1564: fix immediate refetch for pages not modified
add 58687ec9e NUTCH-1564: fix AdaptiveFetchSchedule for unmodified pages
add 103fff608 NUTCH-1564: address code review comments.
add 7f724a9c5 Merge pull request #880 from
igiguere/NUTCH-1564-AdaptiveFetchSchedule-refetch
add 4207bc313 NUTCH-3148 Cache Ivy dependencies in GitHub CI builds (#886)
add 7c5a529dc NUTCH-3143 GitHub workflow does not run all unit tests (#889)
add f8577a0d7 NUTCH-3143 GitHub workflow does not run all unit tests (#890)
add 3fb806830 NUTCH-3110 Upgrade to Tika 3.1.0
add 76ced9b18 NUTCH-3110 Upgrade to Tika 3.1.0
add 713835b73 NUTCH-3110 Upgrade to Tika 3.2.3
add 3101a9e6f Merge pull request #887 from lewismc/NUTCH-3110
add 1242e22ba NUTCH-3142 Add Error Context to Metrics (#882)
add f7c7e1a03 NUTCH-3150 Expand Caching Hadoop Counter References (#892)
add 1d25cb8f0 NUTCH-3152 Job counters getGroup to use metrics constants
add 195b4c011 NUTCH-3153 Update of license and notice files
add a4c9cc472 Nutch 1.21 release - update current year in API docs -
update version number - update changes / release notes
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
annotated tag are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (cc4d2150e)
\
N -- N -- N refs/tags/release-1.22 (6a4ec040a)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
No new revisions were added by this update.
Summary of changes:
.github/workflows/junit-report.yml | 65 +++
.github/workflows/master-build.yml | 63 ++-
CHANGES.md | 53 +++
LICENSE-binary | 38 +-
NOTICE-binary | 157 ++++---
build.xml | 119 +++--
conf/log4j2.xml | 2 +-
conf/nutch-default.xml | 9 +-
default.properties | 7 +-
ivy/ivy.xml | 81 ++--
licenses-binary/LICENSE-bsd-licence.txt | 39 ++
...version-2-gpl2-with-the-classpath-exception.txt | 15 -
...y-extreme-lab-software-license-vesion-1.1.1.txt | 0
src/bin/nutch | 2 +-
.../apache/nutch/crawl/AdaptiveFetchSchedule.java | 39 +-
src/java/org/apache/nutch/crawl/CrawlDb.java | 3 +-
src/java/org/apache/nutch/crawl/CrawlDbFilter.java | 30 +-
.../org/apache/nutch/crawl/CrawlDbReducer.java | 31 +-
.../org/apache/nutch/crawl/DeduplicationJob.java | 28 +-
src/java/org/apache/nutch/crawl/Generator.java | 98 +++-
src/java/org/apache/nutch/crawl/Injector.java | 76 ++-
src/java/org/apache/nutch/fetcher/Fetcher.java | 42 +-
.../org/apache/nutch/fetcher/FetcherThread.java | 120 +++--
src/java/org/apache/nutch/fetcher/QueueFeeder.java | 33 +-
.../org/apache/nutch/hostdb/ResolverThread.java | 69 ++-
.../apache/nutch/hostdb/UpdateHostDbMapper.java | 28 +-
.../apache/nutch/hostdb/UpdateHostDbReducer.java | 28 +-
src/java/org/apache/nutch/indexer/CleaningJob.java | 18 +-
.../org/apache/nutch/indexer/IndexerMapReduce.java | 84 +++-
src/java/org/apache/nutch/indexer/IndexingJob.java | 13 +-
.../nutch/metadata/SpellCheckedMetadata.java | 4 +-
.../org/apache/nutch/metrics/ErrorTracker.java | 383 +++++++++++++++
.../org/apache/nutch/metrics/LatencyTracker.java | 144 ++++++
.../org/apache/nutch/metrics/NutchMetrics.java | 432 +++++++++++++++++
.../Feed.java => metrics/package-info.java} | 25 +-
src/java/org/apache/nutch/parse/ParseSegment.java | 25 +-
src/java/org/apache/nutch/protocol/Protocol.java | 21 +
.../apache/nutch/scoring/webgraph/WebGraph.java | 22 +-
.../apache/nutch/service/impl/ConfManagerImpl.java | 2 +-
.../service/impl/NutchServerPoolExecutor.java | 2 +-
.../nutch/service/resources/SeedResource.java | 2 +-
src/java/org/apache/nutch/tools/FileDumper.java | 2 +-
.../org/apache/nutch/tools/warc/WARCExporter.java | 59 ++-
.../org/apache/nutch/util/DomainStatistics.java | 31 +-
.../org/apache/nutch/util/SitemapProcessor.java | 78 +++-
src/java/org/apache/nutch/util/URLUtil.java | 4 +-
src/plugin/build-plugin.xml | 53 ++-
.../creativecommons/nutch/TestCCParseFilter.java | 15 +-
.../apache/nutch/parse/feed/TestFeedParser.java | 13 +-
.../parse/headings/TestHeadingsParseFilter.java | 12 +-
.../indexer/anchor/TestAnchorIndexingFilter.java | 18 +-
.../arbitrary/TestArbitraryIndexingFilter.java | 166 ++++---
.../indexer/basic/TestBasicIndexingFilter.java | 38 +-
.../nutch/indexer/jexl/TestJexlIndexingFilter.java | 46 +-
.../indexer/links/TestLinksIndexingFilter.java | 63 ++-
.../nutch/indexer/more/TestMoreIndexingFilter.java | 41 +-
.../nutch/indexer/replace/TestIndexReplace.java | 67 ++-
.../staticfield/TestStaticFieldIndexerTest.java | 97 ++--
.../nutch/indexwriter/csv/TestCSVIndexWriter.java | 51 +-
src/plugin/language-identifier/ivy.xml | 2 +-
src/plugin/language-identifier/plugin.xml | 13 +-
.../analysis/lang/TestHTMLLanguageParser.java | 16 +-
.../apache/nutch/protocol/http/api/HttpBase.java | 14 +-
.../nutch/protocol/http/api/TestHttpBase.java | 73 +++
.../protocol/http/api/TestRobotRulesParser.java | 37 +-
.../urlfilter/api/RegexURLFilterBaseTest.java | 19 +-
.../indexer/filter/MimeTypeIndexingFilterTest.java | 26 +-
.../org/apache/nutch/parse/ext/TestExtParser.java | 18 +-
.../nutch/parse/html/TestDOMContentUtils.java | 51 +-
.../apache/nutch/parse/html/TestHtmlParser.java | 23 +-
.../nutch/parse/html/TestRobotsMetaProcessor.java | 30 +-
src/plugin/parse-js/plugin.xml | 4 +-
.../apache/nutch/parse/js/TestJSParseFilter.java | 29 +-
.../nutch/parse/metatags/TestMetatagParser.java | 20 +-
src/plugin/parse-tika/howto_upgrade_tika.md | 37 +-
src/plugin/parse-tika/ivy.xml | 10 +-
src/plugin/parse-tika/plugin.xml | 92 +++-
.../apache/nutch/parse/tika/DOMContentUtils.java | 54 ++-
.../org/apache/nutch/parse/tika/TikaParser.java | 58 +--
.../nutch/parse/tika/TestBoilerpipeExtraction.java | 112 +++++
.../nutch/parse/tika/TestDOMContentUtils.java | 53 ++-
.../nutch/parse/tika/TestEmbeddedDocuments.java | 12 +-
.../nutch/parse/tika/TestEncodingDetection.java | 193 ++++++++
.../apache/nutch/parse/tika/TestFeedParser.java | 12 +-
.../apache/nutch/parse/tika/TestHtmlParser.java | 27 +-
.../apache/nutch/parse/tika/TestImageMetadata.java | 9 +-
.../parse/tika/TestLinkExtractionEdgeCases.java | 221 +++++++++
.../apache/nutch/parse/tika/TestMSWordParser.java | 13 +-
.../nutch/parse/tika/TestMetadataExtraction.java | 223 +++++++++
.../org/apache/nutch/parse/tika/TestOOParser.java | 7 +-
.../parse/tika/TestParserFailureHandling.java | 221 +++++++++
.../org/apache/nutch/parse/tika/TestPdfParser.java | 7 +-
.../org/apache/nutch/parse/tika/TestRTFParser.java | 12 +-
.../nutch/parse/tika/TestRobotsMetaProcessor.java | 40 +-
.../apache/nutch/parse/tika/TestXlsxParser.java | 9 +-
.../apache/nutch/parse/tika/TikaParserTest.java | 4 +-
.../org/apache/nutch/parse/zip/TestZipParser.java | 10 +-
.../parsefilter/regex/TestRegexParseFilter.java | 14 +-
.../java/org/apache/nutch/protocol/file/File.java | 10 +
.../nutch/protocol/file/TestProtocolFile.java | 32 +-
.../java/org/apache/nutch/protocol/ftp/Ftp.java | 9 +
.../protocol/http/TestBadServerResponses.java | 45 +-
.../nutch/protocol/http/TestProtocolHttp.java | 2 +-
.../protocol/http/TestProtocolHttpByProxy.java | 36 +-
.../apache/nutch/protocol/http/TestResponse.java | 18 +-
.../httpclient/TestProtocolHttpClient.java | 2 +-
src/plugin/protocol-okhttp/ivy.xml | 7 +-
src/plugin/protocol-okhttp/plugin.xml | 16 +-
.../org/apache/nutch/protocol/okhttp/OkHttp.java | 13 +-
.../protocol/okhttp/TestBadServerResponses.java | 110 +++--
.../protocol/okhttp/TestIPAddressFiltering.java | 20 +-
.../nutch/protocol/okhttp/TestProtocolOkHttp.java | 2 +-
.../apache/nutch/protocol/okhttp/TestResponse.java | 18 +-
.../metadata/TestMetadataScoringFilter.java | 17 +-
.../scoring/orphan/TestOrphanScoringFilter.java | 28 +-
.../apache/nutch/collection/TestSubcollection.java | 35 +-
.../automaton/TestAutomatonURLFilter.java | 7 +-
.../urlfilter/domain/TestDomainURLFilter.java | 46 +-
.../TestDomainDenylistURLFilter.java | 26 +-
.../nutch/urlfilter/fast/TestFastURLFilter.java | 13 +-
.../urlfilter/prefix/TestPrefixURLFilter.java | 22 +-
.../nutch/urlfilter/regex/TestRegexURLFilter.java | 7 +-
.../urlfilter/suffix/TestSuffixURLFilter.java | 24 +-
.../urlfilter/validator/TestUrlValidator.java | 68 ++-
.../urlnormalizer/ajax/TestAjaxURLNormalizer.java | 17 +-
.../basic/TestBasicURLNormalizer.java | 14 +-
.../urlnormalizer/host/TestHostURLNormalizer.java | 15 +-
.../urlnormalizer/pass/TestPassURLNormalizer.java | 10 +-
.../protocol/TestProtocolURLNormalizer.java | 9 +-
.../querystring/TestQuerystringURLNormalizer.java | 9 +-
.../regex/TestRegexURLNormalizer.java | 10 +-
.../slash/TestSlashURLNormalizer.java | 56 ++-
.../nutch/crawl/ContinuousCrawlTestUtil.java | 26 +-
.../nutch/crawl/CrawlDbUpdateTestDriver.java | 120 -----
.../nutch/crawl/TestAdaptiveFetchSchedule.java | 113 ++++-
.../nutch/crawl/TestCrawlDbDeduplication.java | 43 +-
.../org/apache/nutch/crawl/TestCrawlDbFilter.java | 25 +-
.../org/apache/nutch/crawl/TestCrawlDbMerger.java | 34 +-
.../org/apache/nutch/crawl/TestCrawlDbStates.java | 59 ++-
...bStates.java => TestCrawlDbStatesExtended.java} | 21 +-
src/test/org/apache/nutch/crawl/TestGenerator.java | 68 +--
src/test/org/apache/nutch/crawl/TestInjector.java | 44 +-
.../org/apache/nutch/crawl/TestLinkDbMerger.java | 32 +-
.../apache/nutch/crawl/TestSignatureFactory.java | 12 +-
.../nutch/crawl/TestTextProfileSignature.java | 20 +-
src/test/org/apache/nutch/fetcher/TestFetcher.java | 38 +-
.../apache/nutch/indexer/TestIndexerMapReduce.java | 58 +--
.../apache/nutch/indexer/TestIndexingFilters.java | 10 +-
.../org/apache/nutch/metadata/TestMetadata.java | 153 +++---
.../nutch/metadata/TestSpellCheckedMetadata.java | 243 ++++++----
.../org/apache/nutch/metrics/TestErrorTracker.java | 514 +++++++++++++++++++++
src/test/org/apache/nutch/net/TestURLFilters.java | 2 +-
.../org/apache/nutch/net/TestURLNormalizers.java | 24 +-
.../nutch/net/protocols/TestHttpDateFormat.java | 30 +-
.../apache/nutch/parse/TestOutlinkExtractor.java | 51 +-
src/test/org/apache/nutch/parse/TestOutlinks.java | 10 +-
src/test/org/apache/nutch/parse/TestParseData.java | 7 +-
.../org/apache/nutch/parse/TestParseSegment.java | 7 +-
src/test/org/apache/nutch/parse/TestParseText.java | 2 +-
.../org/apache/nutch/parse/TestParserFactory.java | 49 +-
.../org/apache/nutch/plugin/TestPluginSystem.java | 51 +-
.../protocol/AbstractHttpProtocolPluginTest.java | 14 +-
.../org/apache/nutch/protocol/TestContent.java | 46 +-
.../apache/nutch/protocol/TestProtocolFactory.java | 31 +-
.../apache/nutch/segment/TestSegmentMerger.java | 32 +-
.../segment/TestSegmentMergerCrawlDatums.java | 36 +-
.../org/apache/nutch/service/TestNutchServer.java | 2 +-
.../nutch/tools/TestCommonCrawlDataDumper.java | 10 +-
.../org/apache/nutch/util/DumpFileUtilTest.java | 19 +-
.../apache/nutch/util/ReducerContextWrapper.java | 407 ++++++++++++++++
.../apache/nutch/util/TestEncodingDetector.java | 13 +-
src/test/org/apache/nutch/util/TestGZIPUtils.java | 43 +-
src/test/org/apache/nutch/util/TestMimeUtil.java | 17 +-
src/test/org/apache/nutch/util/TestNodeWalker.java | 18 +-
.../apache/nutch/util/TestPrefixStringMatcher.java | 26 +-
src/test/org/apache/nutch/util/TestStringUtil.java | 39 +-
.../apache/nutch/util/TestSuffixStringMatcher.java | 25 +-
src/test/org/apache/nutch/util/TestTableUtil.java | 5 +-
src/test/org/apache/nutch/util/TestURLUtil.java | 177 +++----
.../org/apache/nutch/util/WritableTestUtils.java | 5 +-
180 files changed, 6439 insertions(+), 2296 deletions(-)
create mode 100644 .github/workflows/junit-report.yml
create mode 100644 licenses-binary/LICENSE-bsd-licence.txt
delete mode 100644
licenses-binary/LICENSE-gnu-general-public-license-version-2-gpl2-with-the-classpath-exception.txt
delete mode 100644
licenses-binary/LICENSE-indiana-university-extreme-lab-software-license-vesion-1.1.1.txt
create mode 100644 src/java/org/apache/nutch/metrics/ErrorTracker.java
create mode 100644 src/java/org/apache/nutch/metrics/LatencyTracker.java
create mode 100644 src/java/org/apache/nutch/metrics/NutchMetrics.java
copy src/java/org/apache/nutch/{metadata/Feed.java =>
metrics/package-info.java} (64%)
mode change 100644 => 100755
src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
create mode 100755
src/plugin/lib-http/src/test/org/apache/nutch/protocol/http/api/TestHttpBase.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestBoilerpipeExtraction.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestEncodingDetection.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestLinkExtractionEdgeCases.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMetadataExtraction.java
create mode 100644
src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestParserFailureHandling.java
delete mode 100644 src/test/org/apache/nutch/crawl/CrawlDbUpdateTestDriver.java
rename src/test/org/apache/nutch/crawl/{TODOTestCrawlDbStates.java =>
TestCrawlDbStatesExtended.java} (97%)
create mode 100644 src/test/org/apache/nutch/metrics/TestErrorTracker.java
create mode 100644 src/test/org/apache/nutch/util/ReducerContextWrapper.java