Messages by Thread
-
(nutch) branch master updated: NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
snagel
-
(nutch) branch master updated: NUTCH-3029
markus
-
(nutch) branch master updated: NUTCH-3033 Upgrade Ivy to v2.5.2 (#803)
lewismc
-
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
-
(nutch) branch master updated: NUTCH-3030 Use system default cipher suites instead of hard-coded set
markus
-
(nutch) branch master updated: Update Dockerfile / JAVA_HOME - 2nd try (#805)
lewismc
-
(nutch) branch master updated: NUTCH-3031 ProtocolFactory host mapper to support domains
markus
-
(nutch) branch revert-801-patch-2 deleted (was 54394b9ed)
lewismc
-
(nutch) branch branch-1.19 updated: Revert "Update Dockerfile / JAVA_HOME (#801)" (#804)
lewismc
-
(nutch) branch revert-801-patch-2 created (now 54394b9ed)
lewismc
-
(nutch) branch branch-1.19 updated: Update Dockerfile / JAVA_HOME (#801)
lewismc
-
(nutch) branch master updated: Update crawl documentation
snagel
-
(nutch) branch master updated: NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java
markus
-
(nutch) branch master updated: NUTCH-3024 Remove flaky 'dependency check' target (#795)
lewismc
-
(nutch) branch NUTCH-3026 created (now 3a294709d)
tallison
-
(nutch) branch master updated (adadc43fb -> 7ad382d95)
snagel
-
(nutch) branch master updated (90849124d -> adadc43fb)
snagel
-
(nutch) branch master updated: NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794)
tallison
-
(nutch) branch master updated: NUTCH-3019 -- update Tika (#797)
tallison
-
(nutch) branch master updated: NUTCH-3014 Standardize Job names (#789)
lewismc
-
(nutch) branch master updated: NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790)
lewismc
-
[nutch] branch master updated: NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788)
lewismc
-
[nutch] branch master updated: NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents - fall back to UTF-8 when stringifying the content of unparsed documents
snagel
-
[nutch] branch master updated: NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)
snagel
-
[nutch] branch master updated: NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779)
snagel
-
[nutch] branch master updated: NUTCH-3009 Upgrade to Hadoop 3.3.6
snagel
-
[nutch] branch master updated: NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive - implement class CaseInsensitiveMetadata providing case-insensitive metadata look-ups (but no spell-checking) - use CaseInsensitiveMetadata to hold HTTP header metadata in in the class OkHttpResponse of protocol-okhttp - add unit tests to prove the fix (and also case-insensitive look-ups and spell-checking in protocol-http)
snagel
-
[nutch] branch master updated (a74b57b90 -> 97eb0b5ac)
tallison
-
[nutch] branch master updated (a1ab4333e -> a74b57b90)
snagel
-
[nutch] branch master updated: NUTCH-2897 Do not supress deprecated API warnings - deprecate constructor of NutchJob - remove deprocated call to Object.finalize() from Plugin.finalize()
snagel
-
[nutch] branch master updated: NUTCH-3010 Injector: count unique number of injected URLs - add counter urls_injected_unique - improve log messages reporting the counts of injected/merged URLs
snagel
-
[nutch] branch master updated (417b87732 -> a72a53a32)
snagel
-
[nutch] branch master updated: NUTCH-2852 SpotBugs: Method invokes System.exit(...) - remove all calls of System.exit(...) in methods except main(args) of various "checker" tools
snagel
-
[nutch] branch master updated: NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..."
tallison
-
[nutch] branch master updated (0ad935fdc -> d81be5181)
tallison
-
[nutch] branch master updated: Remove Any23 from Nutch
tallison
-
[nutch] branch master updated: NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element.
tallison
-
[nutch] branch master updated: NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header
tallison
-
[nutch] branch master updated: NUTCH-2999 -- upgrade lucene to latest 8.x throughout
tallison
-
[nutch] branch NUTCH-2999 deleted (was 3bb8b0eeb)
tallison
-
[nutch] branch master updated (f5cd0d633 -> e93aa977e)
tallison
-
[nutch] branch NUTCH-2999 created (now 3bb8b0eeb)
tallison
-
[nutch] branch master updated: NUTCH-2989 -- ElasticIndexWriter should enable auth for https, too
tallison
-
[nutch] branch master updated: NUTCH-2997 Add Override annotations
snagel
-
[nutch] branch master updated: NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4
snagel
-
[nutch] branch master updated: NUTCH-2995 Upgrade to crawler-commons 1.4
snagel
-
[nutch] branch master updated: NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern - apply patch contributed by Markus Jelsma
snagel
-
[nutch-site] branch asf-staging updated: Add logo on URL path where requested README.md in source code repository
snagel
-
[nutch-site] branch main updated: Add logo on URL path where requested README.md in source code repository
snagel
-
[nutch-site] branch asf-site updated: Add logo on URL path where requested README.md in source code repository
snagel
-
[nutch-site] branch asf-site updated: Add link to ASF privacy policies
snagel
-
[nutch-site] branch main updated: Add link to ASF privacy policies
snagel
-
[nutch-site] branch asf-staging updated: Add link to ASF privacy policies
snagel
-
[nutch-site] branch main updated (aa45c17 -> db7208f)
snagel
-
[nutch-site] branch asf-site updated: - add new committer / PMC - update copyright year 2022 -> 2023 - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
-
[nutch-site] branch asf-staging updated: - add new committer / PMC - update copyright year 2022 -> 2023 - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
-
[nutch-webapp] branch dependabot/maven/com.h2database-h2-2.2.220 created (now 0b5fed6)
github-bot
-
[nutch-webapp] branch dependabot/maven/com.google.guava-guava-32.0.0-jre created (now b38a4ff)
github-bot
-
[nutch] branch master updated: NUTCH-2991 Support HTTP/S Header Authorization for Solr connections (#763)
snagel
-
[nutch] branch master updated: NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached - if QueueFeeder is still alive, also block queues which are empty right now
snagel
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.24.RELEASE created (now 70deb3a)
github-bot
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.23.RELEASE created (now 9e33145)
github-bot
-
[nutch] branch master updated: NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty - upgrade from org.mortbay.jetty 6.1.26 to org.eclipse.jetty 9.4.50 (Hadoop depends on 9.4.43) - remove obsolete dependency exclusions of hadoop-common - upgrade Fetcher unit tests to use org.eclipse.jetty
snagel
-
[nutch] branch master updated: NUTCH-2984 Drop test proxy server and benchmark tool
snagel
-
[nutch] branch master updated: NUTCH-2985 Disable plugin urlfilter-validator by default
snagel
-
[nutch] branch master updated: NUTCH-2983 nutch-default.xml improvements - remove property "hadoop.job.history.user.location", obsolete since Hadoop 0.21.0 - normalize spelling (case) of URL and CrawlDb - trim trailing space - fix typos - improve description of properties {db,linkdb}.ignore.{ex,in}ternal.links
snagel
-
[nutch] branch master updated: NUTCH-2972 Javadoc build fails using JDK 17 - fix Javadoc issues when building with JDK 17
snagel
-
[nutch] branch master updated: NUTCH-2982 Generator: parameter for URL normalization not passed forward - pass forward params `norm` and `maxNumSegments` - fix typos in Javadoc
snagel
-
[nutch] branch master updated (383aeca5d -> e8fd21090)
snagel
-
[nutch] branch master updated: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit
snagel
-
[nutch] branch master updated: NUTCH-2974 Ant build fails with "Unparseable date" on certain platforms
snagel
-
[nutch] branch master updated: NUTCH-2634 Some links marked as "nofollow" are followed anyway - fix detection of nofollow in multi-valued rel attributes
snagel
-
[nutch] branch master updated: NUTCH-2924 Generate maxCount expr evaluated only once
markus
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-web-6.0.0 created (now dc3ba0a)
github-bot
-
[nutch] branch master updated: NUTCH-2977
markus
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-web-4.2.7.RELEASE created (now faede31)
github-bot
-
[nutch] branch master updated (85f7bcb63 -> ed7b6615b)
snagel
-
svn commit: r56776 - /release/nutch/1.18/
snagel
-
[nutch] branch master updated (ffe059892 -> 85f7bcb63)
snagel
-
[nutch-site] branch main updated (4efc5a9 -> aa45c17)
snagel
-
[nutch-site] branch asf-site updated: Announce release of Nutch 1.19 - fix release data in announcement
snagel
-
[nutch-site] branch asf-staging updated: Announce release of Nutch 1.19 - fix release data in announcement
snagel
-
[nutch-site] branch asf-site updated (a41c7ef -> 314b1b2)
snagel
-
[nutch-site] branch asf-staging updated (3e9e725 -> 2cfe00d)
snagel
-
[nutch-site] branch asf-staging updated: Announce release of Nutch 1.19
snagel
-
svn commit: r56738 [1/3] - /release/nutch/1.19/CHANGES.txt
snagel
-
svn commit: r56738 [3/3] - /release/nutch/1.19/CHANGES.txt
snagel
-
svn commit: r56738 [2/3] - /release/nutch/1.19/CHANGES.txt
snagel
-
[nutch-site] branch main updated: NUTCH-1999 Add /robots.txt to Nutch site (#1)
snagel
-
[nutch-site] branch asf-staging updated: - add README for branch asf-staging - modify .asf.yaml to contain only instructions required in branch asf-staging
snagel
-
[nutch-site] branch NUTCH-1999-nutch-site-robots-txt updated (142489f -> f863c1f)
snagel
-
[nutch-site] branch asf-staging updated: Sync .asf.yaml file with main branch
snagel
-
[nutch-site] branch asf-staging created (now d77dbb5)
snagel
-
svn commit: r56686 - /dev/nutch/1.19/ /release/nutch/1.19/
snagel
-
svn commit: r56398 - /dev/nutch/1.19/
snagel
-
[nutch] branch branch-1.19 created (now 63d4f11c0)
snagel
-
[nutch] annotated tag release-1.19 updated (63d4f11c0 -> 5d7660ceb)
snagel
-
[nutch] branch master updated: NUTCH-2969 Javadoc: Javascript search is not working when built on JDK 11 - pass --no-module-directories to javadoc target when building on JDK 11 - remove obsolete condition to fail javadoc builds on JDK 7u25 and earlier
snagel
-
[nutch] branch master updated (bca5fc0d0 -> 635ef2f3b)
snagel
-
[nutch] branch master updated (bec577d50 -> bca5fc0d0)
snagel
-
[nutch] branch master updated: NUTCH-2863 Injector to parse command-line flags case-insensitive
snagel
-
[nutch] branch master updated: NUTCH-2962 Update and complete package info of protocol plugins
snagel
-
[nutch] branch master updated: NUTCH-2930 Protocol-okhttp: implement IP filter (#736)
snagel
-
[nutch] branch master updated (c0f723e99 -> 05afebd03)
snagel
-
[nutch] branch master updated (edebfe49f -> c0f723e99)
snagel
-
[nutch] branch master updated (a5a630055 -> edebfe49f)
snagel
-
[nutch] branch master updated (82f9530dc -> a5a630055)
snagel
-
[nutch] branch master updated (b7b834501 -> 82f9530dc)
snagel
-
[nutch] branch master updated (8fc4f17ac -> b7b834501)
snagel
-
[nutch] branch master updated: NUTCH-2956 index-geoip: dependency upgrades and improvements - upgrade to geoip2 3.0.1 - exclude transitive dependencies (Jackson) provided as Nutch core deps - read also GeoLite2-*.mmdb files - review index field names in plugin and Nutch Solr schema: - fix typos in field names - remove unused fields from schema
snagel
-
[nutch] branch master updated: NUTCH-2953 Indexer Elastic to ignore SSL issues - apply patch contributed by Markus Jelsma - fix class imports
snagel
-
[nutch] branch master updated: NUTCH-2952 Upgrade core dependencies - Hadoop 3.1.3 -> 3.3.3 - log4j 2.17.0 -> 2.17.2 - and some more
snagel
-
[nutch] branch master updated (5b970ff22 -> 487110b07)
snagel
-
[nutch] branch master updated: NUTCH-2951 Crawl datum with metadata WRITABLE_GENERATE_TIME_KEY awaits fetching forever - bug fix: add missing braces (bug introduced with NUTCH-2737, solution contributed by Lapadula Alessandro)
snagel
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.22.RELEASE created (now e19e71e)
github-bot
-
[nutch] branch master updated (02dca3b6d -> 47d3fe607)
snagel
-
[nutch] branch master updated: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode (#726)
lewismc
-
[nutch] branch master updated (568993b90 -> bdbe7b330)
snagel
-
[nutch] branch master updated: NUTCH-2948 Upgrade dependencies to Any23 2.7 and Tika 2.3.0
snagel
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.3.19 created (now b974d7a)
github-bot
-
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.3.18 created (now 6524080)
github-bot
-
[nutch] branch master updated: NUTCH-2923: Added JobId in Job Failure logs (#721)
snagel
-
[nutch] branch master updated: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status (#724)
snagel
-
[nutch] branch master updated: NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding - catch IllegalArgumentException when unescaping percent-encoding in URLs - if one URL of two compared URLs is valid, keep it as non-duplicate - add unit tests for DeduplicationJob
snagel
-
[nutch] branch master updated: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 (#717)
lewismc
-
[nutch] branch master updated: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - sleep for a configurable delay (fetcher.threads.start.delay) before starting the next Fetcher thread to avoid that resources (DNS, Tika XML parser pools) are temporarily exhausted when Fetcher threads fetch the first pages simultaneously
snagel
-
[nutch-site] branch NUTCH-1999-nutch-site-robots-txt created (now 142489f)
snagel
-
[nutch] branch master updated (e76d69f -> 78e827a)
snagel
-
[nutch] branch master updated: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers (#720)
lewismc
-
svn commit: r1077907 - /websites/staging/nutch/
gmcdonald
-
svn commit: r1077906 - /websites/production/nutch/
gmcdonald
-
[nutch] branch master updated: Upgrade to log4j 2.17.0 (#719)
snagel
-
[nutch] branch master updated: NUTCH-2917 Remove transitive dependency to log4j 1.x (#718)
snagel
-
[nutch] branch master updated: NUTCH-2449 Replace Tika LanguageIdentifier in language-identifier (#716)
lewismc
-
[nutch] branch master updated: NUTCH-2914 nutch-default.xml: remove obsolete and unused properties (#709)
snagel
-
[nutch] branch master updated: NUTCH-2807 SitemapProcessor to warn that ignoring robots.txt affects detection of sitemaps (#710)
snagel
-
[nutch] branch master updated (4caa5ce -> af29192)
snagel
-
[nutch] branch master updated: NUTCH-2918 Upgrade to log4j 2.16.0 (#715)
snagel
-
[nutch] branch master updated: NUTCH-2916 Fix log file rotation / rename default log file (#714)
snagel
-
[nutch] branch master updated: NUTCH-2915 Upgrade to log4j 2.15.0
snagel
-
[nutch] branch master updated (a62168c -> 9a2f94f)
snagel
-
[nutch] branch master updated (dd27044 -> a62168c)
snagel
-
[nutch] branch master updated: NUTCH-2905 Mask sensitive strings in log output of index writers - add utility methods (StringUtil) to mask password strings or passwords in strings - mask passwords in log output of index writers (Elasticsearch, Solr, RabbitMQ) - mask password in trace log of protocol-httpclient when using basic authentication
snagel
-
[nutch] branch master updated: NUTCH-2908 Log mapreduce job messages and counters in local mode (Log4j2)
snagel
-
[nutch] branch master updated (ff800c5 -> 671f904)
snagel
-
[nutch-site] branch main updated: Add doap.rdf (lost during CMS migration)
snagel
-
[nutch-site] branch asf-site updated: Add doap.rdf (lost during CMS migration)
snagel
-
[nutch-site] branch asf-site updated: Add favicon
lewismc
-
[nutch-site] branch main updated (b720870 -> 198d962)
lewismc
-
[nutch-site] branch main updated (819de2a -> b720870)
lewismc
-
[nutch-site] branch asf-site created (now b720870)
lewismc
-
[nutch-site] branch main updated: Remove broken site
lewismc
-
[nutch] branch master updated (75daf3e -> ff800c5)
snagel
-
[nutch] branch master updated (64fb604 -> 75daf3e)
snagel
-
[nutch] branch master updated (25ccf89 -> 64fb604)
snagel
-
[nutch] branch master updated: Upgrade to crawler-commons 1.2
snagel
-
[nutch] branch master updated: NUTCH-2902 Jexl parsing error on statements (contributed by Max Ockner) - use JexlScript instead of JexlExpression in Generator, CrawlDb/HostDb reader, Jexl exchange and indexing filter
snagel
-
[nutch] branch master updated: NUTCH-2899 Remove needless warning about missing o/a/rat/anttasks/antlib.xml - avoid needless warning by moving taskdef into task element
snagel
-
[nutch] branch master updated: NUTCH-2862 Do not include Ivy jar in source release package
snagel
-
[nutch] branch master updated: quick IntelliJ IDEA setup docs added (#698)
lewismc
-
[nutch] branch master updated (eeb9863 -> c48b8d1)
snagel
-
[nutch] branch master updated: NUTCH-2894 Java plugin compilation classpath: priorize plugin dependencies
snagel
-
[nutch] branch master updated: fireant upgrade dependency elasticsearch-rest-high-level-client in src/plugin/indexer-elastic/ivy.xml from 7.11.1 to 7.13.2 (#688)
lewismc
-
[nutch-site] branch main updated: Attempt to implement single page templating.
lewismc
-
[nutch-site] branch main created (now ae6f9f2)
lewismc
-
[nutch] branch master updated: NUTCH-2885 Upgrade to Log4j2 (#692)
lewismc
-
[nutch-webapp] branch master updated: Add missing files
lewismc
-
[nutch-webapp] branch master created (now da3c282)
lewismc