Messages by Date
-
2024/04/28
(nutch) branch master updated: Boostrap Nutch 1.21 development drive.
lewismc
-
2024/04/28
(nutch) branch master updated: Add GitHub CI badge to README
lewismc
-
2024/04/24
svn commit: r68753 - in /release/nutch: 1.19/ 1.20/apache-nutch-1.20-bin.tar.gz.sha512 1.20/apache-nutch-1.20-bin.zip.sha512 1.20/apache-nutch-1.20-src.tar.gz.sha512 1.20/apache-nutch-1.20-src.zip.sha512 2.4/
lewismc
-
2024/04/24
svn commit: r68752 - /dev/nutch/1.20/ /release/nutch/1.20/
lewismc
-
2024/04/09
svn commit: r68410 [1/3] - /dev/nutch/1.20/
lewismc
-
2024/04/09
svn commit: r68410 [2/3] - /dev/nutch/1.20/
lewismc
-
2024/04/09
svn commit: r68410 [3/3] - /dev/nutch/1.20/
lewismc
-
2024/04/09
(nutch) annotated tag release-1.20 updated (a2cb6aa5d -> 6510cb241)
lewismc
-
2024/04/09
(nutch) branch branch-1.20 updated: Prepare Nutch 1.20 release candidate
lewismc
-
2024/04/09
(nutch) branch branch-1.20 created (now f141a398c)
lewismc
-
2024/04/09
(nutch) 01/01: Prepare Nutch 1.20 release candidate
lewismc
-
2024/04/08
(nutch) branch master updated: NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811)
lewismc
-
2024/04/05
(nutch) branch branch-1.20 deleted (was 9cfe3d7f9)
lewismc
-
2024/04/05
(nutch) 01/01: Prepare for Nutch 1.20 release
lewismc
-
2024/04/05
(nutch) branch branch-1.20 created (now 9cfe3d7f9)
lewismc
-
2024/04/04
(nutch) branch master updated: NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810)
lewismc
-
2024/03/30
(nutch) branch master updated (5a95bc653 -> 1563396d9)
lewismc
-
2024/03/30
(nutch) branch master updated (3905a8df7 -> 5a95bc653)
lewismc
-
2024/03/30
(nutch) branch master updated (367988dfd -> 3905a8df7)
lewismc
-
2024/03/14
(nutch) branch master updated: NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
snagel
-
2024/03/14
(nutch) branch master updated: NUTCH-3029
markus
-
2024/03/13
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
-
2024/03/13
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
-
2024/03/13
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
-
2024/03/13
(nutch) branch master updated: NUTCH-3033 Upgrade Ivy to v2.5.2 (#803)
lewismc
-
2024/03/13
(nutch) branch master updated: NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler
markus
-
2024/03/13
(nutch) branch master updated: NUTCH-3030 Use system default cipher suites instead of hard-coded set
markus
-
2024/03/12
(nutch) branch master updated: Update Dockerfile / JAVA_HOME - 2nd try (#805)
lewismc
-
2024/03/12
(nutch) branch master updated: NUTCH-3031 ProtocolFactory host mapper to support domains
markus
-
2024/03/11
(nutch) branch revert-801-patch-2 deleted (was 54394b9ed)
lewismc
-
2024/03/11
(nutch) branch branch-1.19 updated: Revert "Update Dockerfile / JAVA_HOME (#801)" (#804)
lewismc
-
2024/03/11
(nutch) 01/01: Revert "Update Dockerfile / JAVA_HOME (#801)"
lewismc
-
2024/03/11
(nutch) branch revert-801-patch-2 created (now 54394b9ed)
lewismc
-
2024/03/11
(nutch) branch branch-1.19 updated: Update Dockerfile / JAVA_HOME (#801)
lewismc
-
2024/03/10
(nutch) branch master updated: Update crawl documentation
snagel
-
2024/01/19
(nutch) branch master updated: NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java
markus
-
2023/11/24
(nutch) branch master updated: NUTCH-3024 Remove flaky 'dependency check' target (#795)
lewismc
-
2023/11/17
(nutch) branch NUTCH-3026 created (now 3a294709d)
tallison
-
2023/11/17
(nutch) 01/01: NUTCH-3026 -- first steps towards statusOnly option in IndexingJob
tallison
-
2023/11/08
(nutch) branch master updated (adadc43fb -> 7ad382d95)
snagel
-
2023/11/08
(nutch) 02/02: Merge branch 'NUTCH-3017', closes #793
snagel
-
2023/11/08
(nutch) 01/02: [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input - use Hadoop-provided compression codecs - update description of property urlfilter.fast.file
snagel
-
2023/11/08
(nutch) branch master updated (90849124d -> adadc43fb)
snagel
-
2023/11/06
(nutch) branch master updated: NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794)
tallison
-
2023/11/06
(nutch) branch master updated: NUTCH-3019 -- update Tika (#797)
tallison
-
2023/11/02
(nutch) branch master updated: NUTCH-3014 Standardize Job names (#789)
lewismc
-
2023/10/27
(nutch) branch master updated: NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790)
lewismc
-
2023/10/21
[nutch] branch master updated: NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788)
lewismc
-
2023/10/21
[nutch] branch master updated: NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents - fall back to UTF-8 when stringifying the content of unparsed documents
snagel
-
2023/10/21
[nutch] branch master updated: NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)
snagel
-
2023/10/21
[nutch] branch master updated: NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779)
snagel
-
2023/10/21
[nutch] branch master updated: NUTCH-3009 Upgrade to Hadoop 3.3.6
snagel
-
2023/10/21
[nutch] branch master updated: NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive - implement class CaseInsensitiveMetadata providing case-insensitive metadata look-ups (but no spell-checking) - use CaseInsensitiveMetadata to hold HTTP header metadata in in the class OkHttpResponse of protocol-okhttp - add unit tests to prove the fix (and also case-insensitive look-ups and spell-checking in protocol-http)
snagel
-
2023/10/20
[nutch] branch master updated (a74b57b90 -> 97eb0b5ac)
tallison
-
2023/10/03
[nutch] branch master updated (a1ab4333e -> a74b57b90)
snagel
-
2023/10/03
[nutch] branch master updated: NUTCH-2897 Do not supress deprecated API warnings - deprecate constructor of NutchJob - remove deprocated call to Object.finalize() from Plugin.finalize()
snagel
-
2023/10/02
[nutch] branch master updated: NUTCH-3010 Injector: count unique number of injected URLs - add counter urls_injected_unique - improve log messages reporting the counts of injected/merged URLs
snagel
-
2023/09/30
[nutch] branch master updated (417b87732 -> a72a53a32)
snagel
-
2023/09/30
[nutch] branch master updated: NUTCH-2852 SpotBugs: Method invokes System.exit(...) - remove all calls of System.exit(...) in methods except main(args) of various "checker" tools
snagel
-
2023/09/26
[nutch] branch master updated: NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..."
tallison
-
2023/09/17
[nutch] branch master updated (0ad935fdc -> d81be5181)
tallison
-
2023/09/14
[nutch] branch master updated: Remove Any23 from Nutch
tallison
-
2023/09/13
[nutch] branch master updated: NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element.
tallison
-
2023/09/13
[nutch] branch master updated: NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header
tallison
-
2023/08/30
[nutch] branch master updated: NUTCH-2999 -- upgrade lucene to latest 8.x throughout
tallison
-
2023/08/30
[nutch] branch NUTCH-2999 deleted (was 3bb8b0eeb)
tallison
-
2023/08/30
[nutch] branch master updated (f5cd0d633 -> e93aa977e)
tallison
-
2023/08/30
[nutch] 01/01: Merge pull request #770 from apache/NUTCH-2999
tallison
-
2023/08/30
[nutch] branch NUTCH-2999 created (now 3bb8b0eeb)
tallison
-
2023/08/28
[nutch] branch master updated: NUTCH-2989 -- ElasticIndexWriter should enable auth for https, too
tallison
-
2023/08/22
[nutch] branch master updated: NUTCH-2997 Add Override annotations
snagel
-
2023/08/22
[nutch] branch master updated: NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4
snagel
-
2023/08/22
[nutch] branch master updated: NUTCH-2995 Upgrade to crawler-commons 1.4
snagel
-
2023/08/22
[nutch] branch master updated: NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern - apply patch contributed by Markus Jelsma
snagel
-
2023/08/04
[nutch-site] branch asf-staging updated: Add logo on URL path where requested README.md in source code repository
snagel
-
2023/08/04
[nutch-site] branch main updated: Add logo on URL path where requested README.md in source code repository
snagel
-
2023/08/04
[nutch-site] branch asf-site updated: Add logo on URL path where requested README.md in source code repository
snagel
-
2023/07/20
[nutch-site] branch asf-site updated: Add link to ASF privacy policies
snagel
-
2023/07/20
[nutch-site] branch main updated: Add link to ASF privacy policies
snagel
-
2023/07/20
[nutch-site] branch asf-staging updated: Add link to ASF privacy policies
snagel
-
2023/07/20
[nutch-site] 01/03: - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
-
2023/07/20
[nutch-site] 03/03: Add new committer / PMC
snagel
-
2023/07/20
[nutch-site] 02/03: Update copyright year 2022 -> 2023
snagel
-
2023/07/20
[nutch-site] branch main updated (aa45c17 -> db7208f)
snagel
-
2023/07/20
[nutch-site] branch asf-site updated: - add new committer / PMC - update copyright year 2022 -> 2023 - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
-
2023/07/20
[nutch-site] branch asf-staging updated: - add new committer / PMC - update copyright year 2022 -> 2023 - add link / banner of Apache conferences or events - rename and move link to ASF
snagel
-
2023/07/07
[nutch-webapp] branch dependabot/maven/com.h2database-h2-2.2.220 created (now 0b5fed6)
github-bot
-
2023/06/14
[nutch-webapp] branch dependabot/maven/com.google.guava-guava-32.0.0-jre created (now b38a4ff)
github-bot
-
2023/06/06
[nutch] branch master updated: NUTCH-2991 Support HTTP/S Header Authorization for Solr connections (#763)
snagel
-
2023/05/23
[nutch] branch master updated: NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached - if QueueFeeder is still alive, also block queues which are empty right now
snagel
-
2023/04/17
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.24.RELEASE created (now 70deb3a)
github-bot
-
2023/03/23
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.23.RELEASE created (now 9e33145)
github-bot
-
2023/03/17
[nutch] branch master updated: NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty - upgrade from org.mortbay.jetty 6.1.26 to org.eclipse.jetty 9.4.50 (Hadoop depends on 9.4.43) - remove obsolete dependency exclusions of hadoop-common - upgrade Fetcher unit tests to use org.eclipse.jetty
snagel
-
2023/03/17
[nutch] branch master updated: NUTCH-2984 Drop test proxy server and benchmark tool
snagel
-
2023/03/06
[nutch] branch master updated: NUTCH-2985 Disable plugin urlfilter-validator by default
snagel
-
2023/03/06
[nutch] branch master updated: NUTCH-2983 nutch-default.xml improvements - remove property "hadoop.job.history.user.location", obsolete since Hadoop 0.21.0 - normalize spelling (case) of URL and CrawlDb - trim trailing space - fix typos - improve description of properties {db,linkdb}.ignore.{ex,in}ternal.links
snagel
-
2023/03/06
[nutch] branch master updated: NUTCH-2972 Javadoc build fails using JDK 17 - fix Javadoc issues when building with JDK 17
snagel
-
2023/03/06
[nutch] branch master updated: NUTCH-2982 Generator: parameter for URL normalization not passed forward - pass forward params `norm` and `maxNumSegments` - fix typos in Javadoc
snagel
-
2023/03/06
[nutch] 06/07: fix template to include new key store info. remove unused auth
snagel
-
2023/03/06
[nutch] 01/07: NUTCH-2920 -- first working attempt at migrating ElasticsearchIndexWriter to OpenSearch
snagel
-
2023/03/06
[nutch] 05/07: NUTCH-2920 -- improve username/pw logic and update README.md
snagel
-
2023/03/06
[nutch] 07/07: Add indexer-opensearch-1x to 4 more targets...feedback from sebastian-nagel
snagel
-
2023/03/06
[nutch] branch master updated (383aeca5d -> e8fd21090)
snagel
-
2023/03/06
[nutch] 04/07: NUTCH-2920 -- improve handling for missing trust.store.path in the index-writers.xml
snagel
-
2023/03/06
[nutch] 03/07: NUTCH-2920 -- add keystore for 2-way tls; add back in no-tls option with a stern warning and possibly helpful links.
snagel
-
2023/03/06
[nutch] 02/07: NUTCH-2920 -- fix imports
snagel
-
2023/02/18
[nutch] branch master updated: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit
snagel
-
2023/02/17
[nutch] branch master updated: NUTCH-2974 Ant build fails with "Unparseable date" on certain platforms
snagel
-
2023/01/08
[nutch] branch master updated: NUTCH-2634 Some links marked as "nofollow" are followed anyway - fix detection of nofollow in multi-valued rel attributes
snagel
-
2022/12/12
[nutch] branch master updated: NUTCH-2924 Generate maxCount expr evaluated only once
markus
-
2022/12/09
[nutch-webapp] branch dependabot/maven/org.springframework-spring-web-6.0.0 created (now dc3ba0a)
github-bot
-
2022/12/07
[nutch] branch master updated: NUTCH-2977
markus
-
2022/11/14
[nutch-webapp] branch dependabot/maven/org.springframework-spring-web-4.2.7.RELEASE created (now faede31)
github-bot
-
2022/09/11
[nutch] branch master updated (85f7bcb63 -> ed7b6615b)
snagel
-
2022/09/10
svn commit: r56776 - /release/nutch/1.18/
snagel
-
2022/09/08
[nutch] 02/02: Prepare for new development after release of 1.19 - bump version number (-> 1.20-NAPSHOT)
snagel
-
2022/09/08
[nutch] branch master updated (ffe059892 -> 85f7bcb63)
snagel
-
2022/09/08
[nutch] 01/02: Nutch 1.19 release - update current year in API docs etc. - update version number - add changes / release notes - update links to Hadoop API docs
snagel
-
2022/09/08
[nutch-site] 02/02: Announce release of Nutch 1.19 - fix release data in announcement
snagel
-
2022/09/08
[nutch-site] branch main updated (4efc5a9 -> aa45c17)
snagel
-
2022/09/08
[nutch-site] branch asf-site updated: Announce release of Nutch 1.19 - fix release data in announcement
snagel
-
2022/09/08
[nutch-site] branch asf-staging updated: Announce release of Nutch 1.19 - fix release data in announcement
snagel
-
2022/09/08
[nutch-site] branch asf-site updated (a41c7ef -> 314b1b2)
snagel
-
2022/09/08
[nutch-site] 02/03: Update content from Hugo build after adding Kube modified templates
snagel
-
2022/09/08
[nutch-site] 01/03: - add README for branch asf-site - modify .asf.yaml to contain only instructions required in branch asf-site
snagel
-
2022/09/08
[nutch-site] branch asf-staging updated (3e9e725 -> 2cfe00d)
snagel
-
2022/09/08
[nutch-site] branch asf-staging updated: Announce release of Nutch 1.19
snagel
-
2022/09/08
svn commit: r56738 [1/3] - /release/nutch/1.19/CHANGES.txt
snagel
-
2022/09/08
svn commit: r56738 [3/3] - /release/nutch/1.19/CHANGES.txt
snagel
-
2022/09/08
svn commit: r56738 [2/3] - /release/nutch/1.19/CHANGES.txt
snagel
-
2022/09/08
[nutch-site] branch main updated: NUTCH-1999 Add /robots.txt to Nutch site (#1)
snagel
-
2022/09/08
[nutch-site] branch asf-staging updated: - add README for branch asf-staging - modify .asf.yaml to contain only instructions required in branch asf-staging
snagel
-
2022/09/08
[nutch-site] branch NUTCH-1999-nutch-site-robots-txt updated (142489f -> f863c1f)
snagel
-
2022/09/08
[nutch-site] branch asf-staging updated: Sync .asf.yaml file with main branch
snagel
-
2022/09/08
[nutch-site] 01/01: Update content from Hugo build after adding Kube modified templates
snagel
-
2022/09/08
[nutch-site] branch asf-staging created (now d77dbb5)
snagel
-
2022/09/06
svn commit: r56686 - /dev/nutch/1.19/ /release/nutch/1.19/
snagel
-
2022/08/22
svn commit: r56398 - /dev/nutch/1.19/
snagel
-
2022/08/22
[nutch] branch branch-1.19 created (now 63d4f11c0)
snagel
-
2022/08/22
[nutch] annotated tag release-1.19 updated (63d4f11c0 -> 5d7660ceb)
snagel
-
2022/08/22
[nutch] branch master updated: NUTCH-2969 Javadoc: Javascript search is not working when built on JDK 11 - pass --no-module-directories to javadoc target when building on JDK 11 - remove obsolete condition to fail javadoc builds on JDK 7u25 and earlier
snagel
-
2022/08/21
[nutch] branch master updated (bca5fc0d0 -> 635ef2f3b)
snagel
-
2022/08/21
[nutch] branch master updated (bec577d50 -> bca5fc0d0)
snagel
-
2022/08/21
[nutch] branch master updated: NUTCH-2863 Injector to parse command-line flags case-insensitive
snagel
-
2022/08/19
[nutch] branch master updated: NUTCH-2962 Update and complete package info of protocol plugins
snagel
-
2022/08/19
[nutch] branch master updated: NUTCH-2930 Protocol-okhttp: implement IP filter (#736)
snagel
-
2022/08/19
[nutch] branch master updated (c0f723e99 -> 05afebd03)
snagel
-
2022/08/17
[nutch] branch master updated (edebfe49f -> c0f723e99)
snagel
-
2022/08/17
[nutch] branch master updated (a5a630055 -> edebfe49f)
snagel
-
2022/08/15
[nutch] branch master updated (82f9530dc -> a5a630055)
snagel
-
2022/08/15
[nutch] branch master updated (b7b834501 -> 82f9530dc)
snagel
-
2022/08/12
[nutch] branch master updated (8fc4f17ac -> b7b834501)
snagel
-
2022/08/09
[nutch] branch master updated: NUTCH-2956 index-geoip: dependency upgrades and improvements - upgrade to geoip2 3.0.1 - exclude transitive dependencies (Jackson) provided as Nutch core deps - read also GeoLite2-*.mmdb files - review index field names in plugin and Nutch Solr schema: - fix typos in field names - remove unused fields from schema
snagel
-
2022/08/09
[nutch] branch master updated: NUTCH-2953 Indexer Elastic to ignore SSL issues - apply patch contributed by Markus Jelsma - fix class imports
snagel
-
2022/08/09
[nutch] branch master updated: NUTCH-2952 Upgrade core dependencies - Hadoop 3.1.3 -> 3.3.3 - log4j 2.17.0 -> 2.17.2 - and some more
snagel
-
2022/08/09
[nutch] 01/03: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used - protocol-okhttp: initialize SSLContext used to ignore SSL/TLS certificate verificiation not in a static code block
snagel
-
2022/08/09
[nutch] 02/03: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used - code improvements Nutch plugin system: - use `Class<?>` and remove suppressions of warnings - javadocs: fix typos - remove superfluous white space - autoformat using code style template
snagel
-
2022/08/09
[nutch] 03/03: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used NUTCH-2949 Tasks of a multi-threaded map runner may fail because of slow creation of URL stream handlers
snagel
-
2022/08/09
[nutch] branch master updated (5b970ff22 -> 487110b07)
snagel
-
2022/06/21
[nutch] branch master updated: NUTCH-2951 Crawl datum with metadata WRITABLE_GENERATE_TIME_KEY awaits fetching forever - bug fix: add missing braces (bug introduced with NUTCH-2737, solution contributed by Lapadula Alessandro)
snagel
-
2022/05/25
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.2.22.RELEASE created (now e19e71e)
github-bot
-
2022/05/24
[nutch] branch master updated (02dca3b6d -> 47d3fe607)
snagel
-
2022/05/20
[nutch] branch master updated: NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode (#726)
lewismc
-
2022/05/19
[nutch] 02/02: NUTCH-2946 Fetcher: optionally slow down fetching from hosts with repeated exceptions - configure the delay in seconds as a float instead of milliseconds - use the value of fetcher.server.delay as default - double the delay with every observed exception (exponential backoff) but cap the growth at 2**31 to avoid overflows
snagel
-
2022/05/19
[nutch] branch master updated (568993b90 -> bdbe7b330)
snagel
-
2022/05/19
[nutch] 01/02: NUTCH-2946 Fetcher: slow down fetching from hosts where requests fail repeatedly with exceptions or HTTP status codes mapped to ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx server errors, etc.)
snagel
-
2022/05/12
[nutch] branch master updated: NUTCH-2948 Upgrade dependencies to Any23 2.7 and Tika 2.3.0
snagel
-
2022/04/22
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.3.19 created (now b974d7a)
github-bot
-
2022/04/06
[nutch-webapp] branch dependabot/maven/org.springframework-spring-core-5.3.18 created (now 6524080)
github-bot
-
2022/01/27
[nutch] branch master updated: NUTCH-2923: Added JobId in Job Failure logs (#721)
snagel
-
2022/01/17
[nutch] branch master updated: NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status (#724)
snagel
-
2022/01/17
[nutch] branch master updated: NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding - catch IllegalArgumentException when unescaping percent-encoding in URLs - if one URL of two compared URLs is valid, keep it as non-duplicate - add unit tests for DeduplicationJob
snagel
-
2022/01/15
[nutch] branch master updated: NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 (#717)
lewismc
-
2022/01/14
[nutch] branch master updated: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted - sleep for a configurable delay (fetcher.threads.start.delay) before starting the next Fetcher thread to avoid that resources (DNS, Tika XML parser pools) are temporarily exhausted when Fetcher threads fetch the first pages simultaneously
snagel
-
2022/01/09
[nutch-site] 01/01: NUTCH-1999 Add /robots.txt to Nutch site
snagel
-
2022/01/09
[nutch-site] branch NUTCH-1999-nutch-site-robots-txt created (now 142489f)
snagel
-
2022/01/09
[nutch] branch master updated (e76d69f -> 78e827a)
snagel
-
2022/01/07
[nutch] branch master updated: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers (#720)
lewismc
-
2022/01/05
svn commit: r1077907 - /websites/staging/nutch/
gmcdonald
-
2022/01/05
svn commit: r1077906 - /websites/production/nutch/
gmcdonald
-
2021/12/22
[nutch] branch master updated: Upgrade to log4j 2.17.0 (#719)
snagel
-
2021/12/22
[nutch] branch master updated: NUTCH-2917 Remove transitive dependency to log4j 1.x (#718)
snagel
-
2021/12/17
[nutch] branch master updated: NUTCH-2449 Replace Tika LanguageIdentifier in language-identifier (#716)
lewismc
-
2021/12/17
[nutch] branch master updated: NUTCH-2914 nutch-default.xml: remove obsolete and unused properties (#709)
snagel
-
2021/12/17
[nutch] branch master updated: NUTCH-2807 SitemapProcessor to warn that ignoring robots.txt affects detection of sitemaps (#710)
snagel
-
2021/12/17
[nutch] branch master updated (4caa5ce -> af29192)
snagel
-
2021/12/17
[nutch] branch master updated: NUTCH-2918 Upgrade to log4j 2.16.0 (#715)
snagel
-
2021/12/14
[nutch] branch master updated: NUTCH-2916 Fix log file rotation / rename default log file (#714)
snagel
-
2021/12/13
[nutch] branch master updated: NUTCH-2915 Upgrade to log4j 2.15.0
snagel
-
2021/12/03
[nutch] branch master updated (a62168c -> 9a2f94f)
snagel
-
2021/12/03
[nutch] branch master updated (dd27044 -> a62168c)
snagel
-
2021/12/01
[nutch] branch master updated: NUTCH-2905 Mask sensitive strings in log output of index writers - add utility methods (StringUtil) to mask password strings or passwords in strings - mask passwords in log output of index writers (Elasticsearch, Solr, RabbitMQ) - mask password in trace log of protocol-httpclient when using basic authentication
snagel
-
2021/12/01
[nutch] branch master updated: NUTCH-2908 Log mapreduce job messages and counters in local mode (Log4j2)
snagel
-
2021/12/01
[nutch] branch master updated (ff800c5 -> 671f904)
snagel
-
2021/11/26
[nutch-site] branch main updated: Add doap.rdf (lost during CMS migration)
snagel
-
2021/11/26
[nutch-site] branch asf-site updated: Add doap.rdf (lost during CMS migration)
snagel
-
2021/11/24
[nutch-site] branch asf-site updated: Add favicon
lewismc
-
2021/11/24
[nutch-site] branch main updated (b720870 -> 198d962)
lewismc
-
2021/11/24
[nutch-site] branch main updated (819de2a -> b720870)
lewismc
-
2021/11/24
[nutch-site] branch asf-site created (now b720870)
lewismc