commits
Thread
Date
Earlier messages
Messages by Thread
(nutch) branch master updated: [NUTCH-3160] Remove System.exit(..) from reusable code (#903)
lewismc
(nutch) branch master updated: NUTCH-3085 Augment CI by adding code coverage and code quality reporting (#902)
lewismc
(nutch) branch master updated: NUTCH-3085 Augment CI by adding code coverage and code quality reporting (#901)
lewismc
(nutch) branch dependabot/github_actions/dot-github/workflows/SonarSource/sonarqube-scan-action-6 deleted (was 98e6cb8e0)
lewismc
(nutch) branch master updated: Bump SonarSource/sonarqube-scan-action from 5 to 6 in /.github/workflows (#899)
lewismc
(nutch) branch dependabot/github_actions/dot-github/workflows/SonarSource/sonarqube-scan-action-6 created (now 98e6cb8e0)
github-bot
(nutch) branch master updated: NUTCH-3085 Augment CI by adding code coverage and code quality reporting (#898)
lewismc
(nutch) branch master updated: NUTCH-3085 Augment CI by adding code coverage and code quality reporting (#897)
lewismc
(nutch) branch master updated: NUTCH-3154 Implement integration testing framework for Nutch IndexWriter plugins using Testcontainers (#895)
lewismc
(nutch) branch master updated: NUTCH-3145 Upgrade to JUnit 6 (#883)
lewismc
svn commit: r82604 - release/nutch/1.21
snagel
(nutch) branch master updated (674914c8e -> 433792770)
snagel
(nutch) 01/01: Merge pull request #825 from lewismc/NUTCH-3064
snagel
(nutch) branch master updated: Prepare for new development after release of 1.22 - bump version number -> 1.23-SNAPSHOT - update changelog - update year
snagel
(nutch-site) branch main updated (c0ef4acc -> 47c25f8b)
snagel
(nutch-site) 01/07: Update site for release of Nutch 1.22
snagel
(nutch-site) 05/07: Convert legacy news
snagel
(nutch-site) 06/07: Increase pagination size for news
snagel
(nutch-site) 07/07: Fix CSS style of list bullets
snagel
(nutch-site) 04/07: Add news about migrating to Java 17
snagel
(nutch-site) 03/07: Update year in page footer
snagel
svn commit: r82572 - dev/nutch/1.22 release/nutch/1.22
snagel
svn commit: r82454 - dev/nutch/1.22
snagel
(nutch) annotated tag release-1.22 updated (cc4d2150e -> 6a4ec040a)
snagel
(nutch) branch branch-1.22 created (now a4c9cc472)
snagel
(nutch) 01/01: Nutch 1.21 release - update current year in API docs - update version number - update changes / release notes
snagel
(nutch) annotated tag release-1.22 updated (bf8820348 -> cc4d2150e)
snagel
(nutch) branch master updated: NUTCH-3153 Update of license and notice files
snagel
(nutch) branch master updated: NUTCH-3152 Job counters getGroup to use metrics constants
snagel
(nutch) branch master updated: NUTCH-3150 Expand Caching Hadoop Counter References (#892)
lewismc
(nutch) branch master updated: NUTCH-3142 Add Error Context to Metrics (#882)
lewismc
(nutch) branch master updated (f8577a0d7 -> 3101a9e6f)
snagel
(nutch) 01/01: Merge pull request #887 from lewismc/NUTCH-3110
snagel
(nutch) branch master updated: NUTCH-3143 GitHub workflow does not run all unit tests (#890)
lewismc
(nutch) branch master updated: NUTCH-3143 GitHub workflow does not run all unit tests (#889)
lewismc
(nutch) branch master updated: NUTCH-3148 Cache Ivy dependencies in GitHub CI builds (#886)
lewismc
(nutch) branch master updated (ddabe9694 -> 7f724a9c5)
snagel
(nutch) 01/01: Merge pull request #880 from igiguere/NUTCH-1564-AdaptiveFetchSchedule-refetch
snagel
(nutch) branch master updated: NUTCH-3144 URLUtil unit tests fail after upgrade to crawler-commons 1.6
snagel
(nutch) branch master updated: NUTCH-3143 GitHub workflow does not run all unit tests (#885)
lewismc
(nutch) branch master updated: NUTCH-3143 GitHub workflow does not run all unit tests (#884)
lewismc
(nutch) branch master updated: NUTCH-3141 Cache Hadoop Counter References in Hot Paths (#878)
lewismc
(nutch) branch master updated: NUTCH-3139 protocol-okhttp: add support for zstd content-encoding - upgrade to OkHttp 5.3.2 - enable support for zstd content-encoding
snagel
(nutch) branch master updated: NUTCH-3137 Upgrade Nutch core dependencies (#875)
snagel
(nutch) branch master updated (8307b6b81 -> c7cf56964)
snagel
(nutch) 01/02: NUTCH-3136 Upgrade crawler-commons dependency
snagel
(nutch) 02/02: NUTCH-3136 Upgrade crawler-commons dependency
snagel
(nutch) branch master updated: NUTCH-3135 Cache downloaded ant-eclipse.jar
snagel
(nutch) branch master updated: NUTCH-3133 Upgrade GitHub workflows to JDK 17
snagel
(nutch) branch master updated: NUTCH-3134 Add latency metrics with percentile support to Fetcher, Parser, and Indexer (#876)
lewismc
(nutch) branch master updated: NUTCH-3132 Standardize existing Nutch metrics naming and implementation (#871)
lewismc
(nutch) branch master updated: NUTCH-3126 Report JUnit test results in GitHub pull request thread (#868)
snagel
(nutch) branch master updated (1156801bc -> f65371d1a)
snagel
(nutch) 01/01: Merge pull request #870 from igiguere/NUTCH-2971
snagel
(nutch) branch master updated: NUTCH-3040 Upgrade to Hadoop 3.4.2 (#866)
lewismc
(nutch) branch master updated: NUTCH-3099 Allow wildcard '*' in http.proxy.exception.list (via Isabelle Giguere) (#865)
lewismc
(nutch) branch master updated: NUTCH-3126 Report JUnit test results in GitHub pull request thread (#867)
lewismc
(nutch) branch master updated: NUTCH-3126 Report JUnit test results in GitHub pull request thread (#863)
lewismc
(nutch) branch master updated (e2b60fc00 -> 667e21764)
snagel
(nutch) 01/01: Merge pull request #864 from sebastian-nagel/NUTCH-2887-junit4-mrunit
snagel
(nutch) branch master updated: NUTCH-2887 Migrate to JUnit 5 Jupiter (#861)
lewismc
(nutch) branch master updated (7e43e12b2 -> 3991c5b98)
snagel
(nutch) 01/01: Merge pull request #859 from TamimEhsan/NUTCH-3122
snagel
(nutch) branch master updated: NUTCH-3124 Github workflow not run because of uncertified action "paths-changes-filter"
snagel
(nutch) branch master updated: NUTCH-3118 Logging pattern missing one argument placeholder
snagel
(nutch) branch master updated: NUTCH-3119 Log4j package scanning is deprecated
snagel
svn commit: r78300 - release/nutch/1.20
snagel
(nutch-site) branch asf-site updated: Update release date of Nutch 1.21
snagel
(nutch-site) branch asf-staging updated: Update release date of Nutch 1.21
snagel
(nutch-site) branch main updated: Update release date of Nutch 1.21
snagel
(nutch) branch master updated: Prepare for new development after release of 1.21 - bump version number -> 1.22-SNAPSHOT - update changelog - update year
snagel
(nutch-site) branch main updated (c011a7e -> c7347eb)
snagel
(nutch-site) 01/04: Link to RFC 9309 which Nutch (relying on crawler-commons) is following as robots.txt standard since Nutch 1.19
snagel
(nutch-site) 03/04: Add dates to the news items listed on the news overview page
snagel
(nutch-site) 02/04: Improve formatting of the sitemap.xml
snagel
svn commit: r78271 - dev/nutch/1.21 release/nutch/1.21
snagel
svn commit: r78207 - /dev/nutch/1.21/
snagel
(nutch) annotated tag release-1.21 updated (bf8820348 -> 85963d6ff)
snagel
(nutch) branch branch-1.21 updated (65eb8857d -> bf8820348)
snagel
(nutch) 02/02: NUTCH-3118 Logging pattern missing one argument placeholder
snagel
(nutch) 01/02: NUTCH-3118 Logging pattern missing one argument placeholder
snagel
(nutch) branch master updated: NUTCH-3118 Logging pattern missing one argument placeholder (#857)
snagel
svn commit: r78195 [1/3] - /dev/nutch/1.21/
snagel
svn commit: r78195 [3/3] - /dev/nutch/1.21/
snagel
svn commit: r78195 [2/3] - /dev/nutch/1.21/
snagel
(nutch) branch branch-1.21 created (now 65eb8857d)
snagel
(nutch) 01/01: Nutch 1.21 release - update current year in API docs etc. - update version number - update changes / release notes
snagel
(nutch) branch master updated (e62a0b8e3 -> 671b1e0ea)
snagel
(nutch) 01/01: Merge pull request #851 from sebastian-nagel/NUTCH-3112-parameterized-logging
snagel
(nutch) branch master updated (e85001205 -> e62a0b8e3)
snagel
(nutch) 01/01: Merge pull request #855 from sebastian-nagel/NUTCH-3116-dependency-upgrades
snagel
(nutch) branch master updated (312828602 -> e85001205)
snagel
(nutch) 01/01: Merge pull request #856 from CatChullain/NUTCH-3115
snagel
(nutch) branch master updated: NUTCH-2976 SitemapProcessor: verify sitemap values added from sitemap to CrawlDB (priority, modification time and change frequency) - use default priority if priority <= 0.0 (a CrawlDatum with score 0.0 is not eligible for fetch) - ensure that the fetch interval (from change frequency) is within db.fetch.schedule.adaptive.min_interval and db.fetch.schedule.adaptive.max_interval - ignore last-modified times in the future
snagel
(nutch) branch master updated: NUTCH-3113 Group commands in bin/nutch command-line help thematically
snagel
(nutch) branch master updated: NUTCH-3087 BasicURLNormalizer to keep userinfo for protocols which might require it
snagel
(nutch) branch master updated: NUTCH-3114 Avoid stale fetching when only URLs from queues blocked by the exponential backoff remain
snagel
(nutch) branch master updated (5335e6b08 -> 71eca8831)
snagel
(nutch) 01/01: Merge pull request #847 from tatecn/NUTCH-3106
snagel
(nutch) branch master updated (b61d11fa5 -> 5335e6b08)
snagel
(nutch) 01/01: Merge pull request #848 from martin-djukanovic/NUTCH-3103
snagel
(nutch) branch master updated (b52ec9025 -> b61d11fa5)
snagel
(nutch) 01/01: Merge pull request #849 from maciejpuzianowski/NUTCH-3108
snagel
(nutch) branch master updated: NUTCH-3100 HostDB to support minimum records per host
markus
(nutch) branch master updated: NUTCH-3101 src/java/org/apache/nutch/crawl/Inlink.java
markus
(nutch) branch master updated (74b49e9a6 -> 3b6d2c6ba)
snagel
(nutch) 01/01: Merge pull request #832 from sebastian-nagel/NUTCH-3072
snagel
(nutch) branch master updated: NUTCH-3086 Consolidate plugin extension names and IDs (#835)
snagel
(nutch) branch master updated (86b893a71 -> 5068b7606)
snagel
(nutch) 01/01: Merge pull request #844 from maciejpuzianowski/NUTCH-3097
snagel
(nutch) branch master updated: NUTCH-3079 Dumping a segment fails unless it has been fetched and parsed
snagel
(nutch) branch master updated: NUTCH-3083 Add RobotRulesParser to bin/nutch
snagel
(nutch) branch master updated: NUTCH-3096 HostDB ResolverThread can create too many job counters (patch contributed by Markus Jelsma)
snagel
(nutch) branch master updated: NUTCH-3092 Replace all imports of commons-lang by commons-lang3
snagel
(nutch) branch master updated: NUTCH-3094 Github tests to run if build configuration changes
snagel
(nutch) branch master updated: NUTCH-3094 Github tests to run if build configuration changes
snagel
(nutch) branch master updated: NUTCH-3095 Update .gitignore to ignore Hadoop native libraries
snagel
(nutch) branch master updated: NUTCH-3093 Ant target test-plugins to depend on compile-core-test (#840)
lewismc
(nutch) branch master updated: NUTCH-2771 Tests in nightly builds: skip long runners
snagel
(nutch) branch master updated: NUTCH-3084 Improve CI by filtering and separating plugin and core test execution (#833)
lewismc
(nutch) branch master updated (a99bd8ea6 -> b02340dfe)
snagel
(nutch) 01/01: Merge pull request #827 from sebastian-nagel/NUTCH-3067
snagel
(nutch) branch master updated: Unlock database when Injector finishes - regardless of result
snagel
(nutch) branch master updated: NUTCH-3075 tld plugin makes injector crash NUTCH-1942 Remove TopLevelDomain
snagel
(nutch) branch master updated (d6f55b8ea -> 4a61208f4)
snagel
(nutch) 01/01: Merge pull request #828 from sebastian-nagel/NUTCH-3073
snagel
(nutch) branch master updated: NUTCH-2812 Methods returning array may expose internal representation
snagel
(nutch) branch master updated (8b11962a4 -> c137b4e0b)
snagel
(nutch) 01/01: Merge pull request #798 from GabeHaegele/NUTCH-2812
snagel
(nutch) branch master updated (582cdd417 -> 8b11962a4)
snagel
(nutch) 01/01: Merge pull request #816 from sebastian-nagel/NUTCH-1942-domain-utils-to-use-crawler-commons
snagel
(nutch) branch master updated: NUTCH-3058 Fetcher: counter for hung threads (#820)
snagel
(nutch) branch master updated: NUTCH-3061 URL filters to log name of the rules file
snagel
(nutch) branch master updated: NUTCH-3062 protocol-okhttp: optionally record HTTP and SSL/TLS versions (#822)
snagel
(nutch) branch master updated (309bc1863 -> bc8bd317f)
snagel
(nutch) 01/01: Merge pull request #823 from sebastian-nagel/NUTCH-3065-changelog-markdown
snagel
(nutch) branch master updated: NUTCH-3066 Protocol plugin unit tests fail randomly
snagel
(nutch) branch master updated (ac03cf164 -> e09d40cbd)
joegilvary
(nutch) 01/01: Merge pull request #819 from CatChullain/NUTCH-3057
joegilvary
(nutch) branch master updated: NUTCH-3063 Support for "addBinaryContent" from REST API
snagel
(nutch) branch master updated: NUTCH-3055 README: fix Github "hub" commands - replace "git" with "hub" were necessary - improve formatting of "contributing" steps
snagel
(nutch) branch master updated (8abc78a65 -> bfa07df29)
snagel
(nutch) 01/01: Merge pull request #815 from sebastian-nagel/NUTCH-3044-generator-npe
snagel
(nutch) branch master updated: NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters (#813)
lewismc
(nutch) branch master updated: NUTCH-3043 Generator: count URLs rejected by URL filters (#814)
snagel
(nutch) branch master updated: NUTCH-3039 Failure to handle ftp:// URLs
snagel
(nutch-site) branch asf-site updated: Revert incorrect change in doap.rdf (see #2)
snagel
(nutch-site) branch asf-staging updated: Revert incorrect change in doap.rdf (see #2)
snagel
(nutch-site) branch main updated: Revert incorrect change (#2)
snagel
(nutch) branch master updated: NUTCH-3054 Address deprecation of Node16 for all GitHub Actions (#817)
lewismc
(nutch) branch master updated: Boostrap Nutch 1.21 development drive.
lewismc
(nutch) branch master updated: Add GitHub CI badge to README
lewismc
svn commit: r68753 - in /release/nutch: 1.19/ 1.20/apache-nutch-1.20-bin.tar.gz.sha512 1.20/apache-nutch-1.20-bin.zip.sha512 1.20/apache-nutch-1.20-src.tar.gz.sha512 1.20/apache-nutch-1.20-src.zip.sha512 2.4/
lewismc
svn commit: r68752 - /dev/nutch/1.20/ /release/nutch/1.20/
lewismc
svn commit: r68410 [1/3] - /dev/nutch/1.20/
lewismc
svn commit: r68410 [2/3] - /dev/nutch/1.20/
lewismc
svn commit: r68410 [3/3] - /dev/nutch/1.20/
lewismc
(nutch) annotated tag release-1.20 updated (a2cb6aa5d -> 6510cb241)
lewismc
(nutch) branch branch-1.20 updated: Prepare Nutch 1.20 release candidate
lewismc
(nutch) branch branch-1.20 created (now f141a398c)
lewismc
(nutch) 01/01: Prepare Nutch 1.20 release candidate
lewismc
(nutch) branch master updated: NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811)
lewismc
(nutch) branch branch-1.20 deleted (was 9cfe3d7f9)
lewismc
(nutch) branch branch-1.20 created (now 9cfe3d7f9)
lewismc
(nutch) 01/01: Prepare for Nutch 1.20 release
lewismc
(nutch) branch master updated: NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810)
lewismc
(nutch) branch master updated (5a95bc653 -> 1563396d9)
lewismc
(nutch) branch master updated (3905a8df7 -> 5a95bc653)
lewismc
(nutch) branch master updated (367988dfd -> 3905a8df7)
lewismc
(nutch) branch master updated: NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
snagel
(nutch) branch master updated: NUTCH-3029
markus
(nutch) branch master updated: NUTCH-3033 Upgrade Ivy to v2.5.2 (#803)
lewismc
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
(nutch) branch master updated: NUTCH-3030 Use system default cipher suites instead of hard-coded set
markus
(nutch) branch master updated: Update Dockerfile / JAVA_HOME - 2nd try (#805)
lewismc
(nutch) branch master updated: NUTCH-3031 ProtocolFactory host mapper to support domains
markus
(nutch) branch revert-801-patch-2 deleted (was 54394b9ed)
lewismc
(nutch) branch branch-1.19 updated: Revert "Update Dockerfile / JAVA_HOME (#801)" (#804)
lewismc
(nutch) branch revert-801-patch-2 created (now 54394b9ed)
lewismc
(nutch) 01/01: Revert "Update Dockerfile / JAVA_HOME (#801)"
lewismc
(nutch) branch branch-1.19 updated: Update Dockerfile / JAVA_HOME (#801)
lewismc
(nutch) branch master updated: Update crawl documentation
snagel
(nutch) branch master updated: NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java
markus
(nutch) branch master updated: NUTCH-3024 Remove flaky 'dependency check' target (#795)
lewismc
(nutch) branch NUTCH-3026 created (now 3a294709d)
tallison
(nutch) 01/01: NUTCH-3026 -- first steps towards statusOnly option in IndexingJob
tallison
(nutch) branch master updated (adadc43fb -> 7ad382d95)
snagel
(nutch) branch master updated (90849124d -> adadc43fb)
snagel
(nutch) 01/02: [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - use Hadoop-provided compression codecs - update description of property urlfilter.fast.file
snagel
(nutch) 02/02: Merge branch 'NUTCH-3017', closes #793
snagel
(nutch) branch master updated: NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794)
tallison
(nutch) branch master updated: NUTCH-3019 -- update Tika (#797)
tallison
(nutch) branch master updated: NUTCH-3014 Standardize Job names (#789)
lewismc
(nutch) branch master updated: NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790)
lewismc
[nutch] branch master updated: NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788)
lewismc
[nutch] branch master updated: NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents - fall back to UTF-8 when stringifying the content of unparsed documents
snagel
[nutch] branch master updated: NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)
snagel
Earlier messages