commits
Thread
Date
Earlier messages
Later messages
Messages by Thread
[nutch] 01/01: Merge pull request #295 from lewismc/NUTCH-2516
lewismc
[nutch] branch master updated: NUTCH-2523 UpdateHostDB blocks usage of plugins unintentionally (contributed by Yossi Tamari)
snagel
[nutch] branch master updated (8bf139d -> 972e494)
snagel
[nutch] 01/01: Merge pull request #296 from sebastian-nagel/NUTCH-2535-readdb-stats-classcastexcept
snagel
[nutch] branch master updated (0e28af6 -> 8bf139d)
lewismc
[nutch] 01/01: Merge pull request #293 from lewismc/NUTCH-2517
lewismc
[nutch] branch master updated: fixed hdfs file checks in crawl script
snagel
[nutch] branch master updated: NUTCH-2411 Index-metadata to support indexing multiple values for a field
markus
[nutch] branch master updated (24071d0 -> bd70d2f)
snagel
[nutch] 01/01: Merge pull request #289 from sebastian-nagel/NUTCH-2521
snagel
[nutch] branch master updated (dc6a0ab -> 24071d0)
snagel
[nutch] 01/01: Merge pull request #288 from sebastian-nagel/NUTCH-2520
snagel
[nutch] branch 2.x updated: NUTCH-2520 Use default value for Accept-Charset if http.accept.charset is undefined
snagel
[nutch] branch 2.x updated: NUTCH-2519 Log mapreduce job messages and counters in local mode
snagel
[nutch] branch master updated (54510e5 -> dc6a0ab)
snagel
[nutch] 01/01: Merge pull request #287 from sebastian-nagel/NUTCH-2519
snagel
[nutch] branch master updated (a2f637e -> 54510e5)
lewismc
[nutch] 01/01: Merge pull request #221 from Omkar20895/NUTCH-2375
lewismc
[nutch] branch master updated (75d0166 -> a2f637e)
lewismc
[nutch] 01/01: Merge pull request #284 from YossiTamari/master
lewismc
[nutch] branch master updated (2b66cda -> 75d0166)
lewismc
[nutch] 01/01: Merge pull request #283 from smartive/NUTCH-2508
lewismc
[nutch] branch master updated: NUTCH-2466
markus
[nutch] branch master updated: Fix for NUTCH-2494 contributed by Ashraful Islam, closes #274
snagel
[nutch] branch master updated (27ff215 -> ec42cfb)
lewismc
[nutch] 01/01: Merge pull request #280 from smartive/NUTCH-2502
lewismc
[nutch] branch master updated (84c2d65 -> 27ff215)
lewismc
[nutch] 01/01: Merge pull request #277 from smartive/NUTCH-2499
lewismc
[nutch] branch master updated (8c4f522 -> 84c2d65)
lewismc
[nutch] 01/01: Merge pull request #281 from smartive/NUTCH-2503
lewismc
[nutch] branch master updated (0587065 -> 8c4f522)
lewismc
[nutch] 01/01: Merge pull request #250 from okedoki/NUTCH-2441
lewismc
[nutch] branch master updated (0bfd857 -> 0587065)
lewismc
[nutch] 01/01: Merge pull request #276 from smartive/NUTCH-2497
lewismc
[nutch] branch master updated (95469e8 -> 0bfd857)
lewismc
[nutch] 01/01: Merge pull request #249 from okedoki/NUTCH-2461
lewismc
[nutch] branch master updated (f82959d -> 95469e8)
lewismc
[nutch] 01/01: Merge pull request #272 from sju/NUTCH-2321
lewismc
[nutch] branch master updated (7e5f22a -> f82959d)
lewismc
[nutch] 01/01: Merge pull request #205 from smartive/feature/NUTCH-1129-microdata
lewismc
[nutch] branch master updated (ea2dd29 -> 7e5f22a)
lewismc
[nutch] 01/01: Merge pull request #273 from smartive/NUTCH-2493
lewismc
[nutch] branch master updated (dc33163 -> ea2dd29)
lewismc
[nutch] 07/07: Merge branch 'patch-1' of https://github.com/sachin086/nutch
lewismc
[nutch] 04/07: Merge branch 'master' of https://github.com/apache/nutch
lewismc
[nutch] 02/07: Merge branch 'master' of https://github.com/apache/nutch
lewismc
[nutch] 01/07: Merge branch 'NUTCH-2433' of https://github.com/maborec/nutch
lewismc
[nutch] 03/07: Merge branch 'master' of https://github.com/apache/nutch
lewismc
[nutch] 06/07: Merge branch 'master' of https://github.com/apache/nutch
lewismc
[nutch] 05/07: Merge branch 'master' of https://github.com/apache/nutch
lewismc
[nutch] branch master updated (2dbdb9d -> dc33163)
lewismc
[nutch] 01/01: Merge pull request #271 from smartive/NUTCH-2492
lewismc
svn commit: r1023507 - /websites/production/nutch/content/
snagel
svn commit: r1023506 - in /websites/staging/nutch/trunk/content: ./ downloads.html
buildbot
svn commit: r1820549 - /nutch/cms_site/trunk/content/downloads.md
snagel
svn commit: r24068 - /release/nutch/1.14/
snagel
[nutch] branch master updated (e54df72 -> 2dbdb9d)
lewismc
[nutch] 01/01: Merge pull request #269 from smartive/NUTCH-2490
lewismc
[nutch] branch master updated (872b4ea -> e54df72)
lewismc
[nutch] 01/01: Merge pull request #270 from smartive/NUTCH-2491
lewismc
[nutch] branch master updated (e533ab2 -> 872b4ea)
lewismc
[nutch] 01/01: Merge pull request #248 from okedoki/NUTCH-2454
lewismc
[nutch] branch master updated: Prepare for new development after release of 1.14, bump - version number (1.14 -> 1.15-SNAPSHOT) - year (2017 -> 2018)
snagel
svn commit: r23905 - /release/nutch/1.13/
snagel
svn commit: r1022708 - /websites/production/nutch/content/
snagel
svn commit: r1022700 - in /websites/staging/nutch/trunk/content: ./ javadoc.html
buildbot
svn commit: r1819242 - /nutch/cms_site/trunk/content/javadoc.md
snagel
svn commit: r1022673 - in /websites/staging/nutch/trunk/content: ./ apidocs/apidocs-1.14/ apidocs/apidocs-1.14/org/ apidocs/apidocs-1.14/org/apache/ apidocs/apidocs-1.14/org/apache/nutch/ apidocs/apidocs-1.14/org/apache/nutch/analysis/ apidocs/apidocs-...
buildbot
svn commit: r1819181 - in /nutch/cms_site/trunk/content: ./ apidocs/apidocs-1.14/ apidocs/apidocs-1.14/org/ apidocs/apidocs-1.14/org/apache/ apidocs/apidocs-1.14/org/apache/nutch/ apidocs/apidocs-1.14/org/apache/nutch/analysis/ apidocs/apidocs-1.14/org...
snagel
svn commit: r23869 - /dev/nutch/1.14/
snagel
svn commit: r23868 - /release/nutch/1.14/
snagel
[nutch] branch 2.x updated: Nutch 2.X GeneratorJob creates NullPointerException when using DataFileAvroStore
lewismc
[nutch] branch 2.x updated: Nutch 2.X GeneratorJob creates NullPointerException when using DataFileAvroStore
lewismc
[nutch] branch master updated (dae62f8 -> 9e0c316)
lewismc
[nutch] 01/01: Merge pull request #267 from smartive/fix/NUTCH-2486-unsafe-warning
lewismc
svn commit: r23783 - in /dev/nutch: ./ 1.14/
snagel
svn commit: r23782 - /release/nutch/KEYS
snagel
[nutch] annotated tag release-1.14 updated (a8e60bd -> af6d141)
snagel
[nutch] branch branch-1.14 created (now a8e60bd)
snagel
[nutch] 01/01: Nutch 1.14 release - update version number - add changes / release notes
snagel
[nutch] branch master updated: NUTCH-2353 Create seed file with metadata using the REST API - reverse commits 0312bae38c9e95d496336dc24133b15ebefd4d3c and 7deb576bc58bb74725cbb6c5d82d7b9244c6ad42 to fix exception in Nutch webapp
snagel
[nutch] branch master updated (30db933 -> c274029)
snagel
[nutch] 01/01: Merge pull request #265 from sebastian-nagel/nutch-2483-remove-dependency-on-org-json
snagel
[nutch] branch master updated (c6e5dfb -> 30db933)
snagel
[nutch] 01/01: Merge pull request #266 from sebastian-nagel/nutch-2295-docker
snagel
[nutch] branch master updated (dd94a61 -> c6e5dfb)
snagel
[nutch] 01/01: Merge pull request #264 from sebastian-nagel/nutch-2365-fetcher-redirects-mode
snagel
[nutch] branch master updated: NUTCH-2380 Upgrade indexer-elastic to Elasticsearch version 5.3.0 (contributed by Jurian Broertjes)
snagel
[nutch] branch master updated (961c725 -> fc89e4f)
snagel
[nutch] 10/23: make fully configurable
snagel
[nutch] 20/23: Improve command-line help for URL filter and normalizer checker
snagel
[nutch] 21/23: NUTCH-2322 URL not available for Jexl operations - apply patch contributed by Markus Jelsma
snagel
[nutch] 07/23: fix delete
snagel
[nutch] 05/23: fix formatting
snagel
[nutch] 16/23: NUTCH-2035 urlfilter-regex case insensitive rules
snagel
[nutch] 11/23: NUTCH-2480 Upgrade crawler-commons dependency to 0.9
snagel
[nutch] 15/23: NUTCH-2362 Upgrade MaxMind GeoIP version in index-geoip
snagel
[nutch] 02/23: NUTCH-2474 CrawlDbReader -stats fails with ClassCastException - replace CrawlDbStatCombiner by CrawlDbStatReducer and ensure that data is properly processed independently whether and how often combiner is called - simplify calculation of minimum and maximum
snagel
[nutch] 13/23: scope variables
snagel
[nutch] 19/23: fix for NUTCH-2477 (refactor checker classes) contributed by Jurian Broertjes
snagel
[nutch] 03/23: - filter out NaN scores which break the quantile calculation
snagel
[nutch] 01/23: fix for NUTCH-2370 contributed by
[email protected]
snagel
[nutch] 23/23: NUTCH-2415 Create a JEXL based IndexingFilter Merge branch 'pipldev-index-jexl-filter', closes #219
snagel
[nutch] 18/23: NUTCH-2478 HTML parser should resolve base URL <base href=...> - finally fix parse-tika: - href attribute of base element dropped in DOM - need to call tikamd.get("Content-Location") - port HTML parser test from parse-html to parse-tika - add method to DomUtil which prints DocumentFragment
snagel
[nutch] 09/23: Add tika-config.xml to suppress Tika warnings on stderr
snagel
[nutch] 06/23: add languages to default config
snagel
[nutch] 12/23: fix indentation
snagel
[nutch] 17/23: NUTCH-2478 HTML parser should resolve base URL <base href=...> - fix parse-html and parse-tika - add unit test for parse-html
snagel
[nutch] 08/23: NUTCH-2439 Upgrade Apache Tika dependency to 1.17
snagel
[nutch] 14/23: NUTCH-2354 Upgrade Hadoop dependencies to 2.7.4
snagel
[nutch] 22/23: NUTCH-2034 CrawlDB update job to count documents in CrawlDb rejected by URL filters (patch contributed by Luis Lopez)
snagel
[nutch] 04/23: Extend indexer-elastic-rest to support languages
snagel
[nutch] branch 2.x updated: NUTCH-2358 HostInjectorJob doesn't work
lewismc
[nutch] branch master updated: NUTCH-2034 CrawlDB update job to count documents in CrawlDb rejected by URL filters (patch contributed by Luis Lopez)
snagel
[nutch] branch master updated (8b3412a -> 2ce1177)
snagel
[nutch] 01/01: Merge pull request #180 from smadha/NUTCH-2370
snagel
[nutch] branch master updated: NUTCH-2322 URL not available for Jexl operations - apply patch contributed by Markus Jelsma
snagel
[nutch] branch master updated (d73f293 -> 8e6cb9d)
snagel
[nutch] 01/03: fix for NUTCH-2477 (refactor checker classes) contributed by Jurian Broertjes
snagel
[nutch] 03/03: Merge branch 'sju:NUTCH-2431' contributed by Jurian Broertjes, closes #256
snagel
[nutch] 02/03: Improve command-line help for URL filter and normalizer checker
snagel
[nutch] branch master updated (45ce310 -> d73f293)
snagel
[nutch] 01/01: Merge pull request #263 from sebastian-nagel/nutch-2478-parser-resolve-base-url
snagel
[nutch] branch master updated (cfd8900 -> 45ce310)
lewismc
[nutch] 01/01: Merge pull request #257 from smartive/feat/indexer-elastic-rest-languages
lewismc
[nutch] branch master updated (bda25c8 -> cfd8900)
snagel
[nutch] 01/01: Merge pull request #262 from sebastian-nagel/nutch-2362-update-maxmind-geoip-dependency
snagel
[nutch] branch master updated (310295c -> bda25c8)
snagel
[nutch] 01/01: Merge pull request #261 from sebastian-nagel/nutch-2354-upgrade-hadoop-2.7.4
snagel
[nutch] branch master updated (f6bd25b -> 310295c)
snagel
[nutch] 01/01: Merge pull request #260 from sebastian-nagel/nutch-2480-upgrade-crawler-commons-0.9
snagel
[nutch] branch master updated (df14c8a -> f6bd25b)
snagel
[nutch] 01/01: Merge pull request #259 from sebastian-nagel/nutch-2439-upgrade-tika-1.17
snagel
[nutch] branch 2.x updated: NUTCH-2035 urlfilter-regex case insensitive rules
snagel
[nutch] branch master updated: NUTCH-2035 urlfilter-regex case insensitive rules
snagel
[nutch] branch master updated (6b04090 -> 0e3036b)
snagel
[nutch] 01/01: Merge pull request #255 from sebastian-nagel/nutch-2474-crawldb-reader-stats-class-cast-exception
snagel
[nutch] branch master updated (d4a2b47 -> 6b04090)
lewismc
[nutch] 01/01: Merge pull request #253 from smartive/fix/indexer-elastic-rest-dependecy
lewismc
[nutch] branch master updated (f483e52 -> d4a2b47)
lewismc
[nutch] 01/01: Merge pull request #217 from pipldev/LanguageIndexingFilter1
lewismc
[nutch] branch 2.x updated (cc2f4ab -> 3486539)
lewismc
[nutch] 01/01: Merge pull request #258 from lewismc/NUTCH-2438
lewismc
[nutch] branch master updated (708cc56 -> f483e52)
jorgelbg
[nutch] 01/01: Merge pull request #236 from jorgelbg/NUTCH-2399
jorgelbg
[nutch] branch master updated (9931acc -> 708cc56)
snagel
[nutch] 01/01: Merge pull request #252 from sebastian-nagel/nutch-2470-crawldb-reader-stats-quantiles
snagel
[nutch] branch 2.x updated: NUTCH-2469 Documents not commited to solr in Sever mode - applied patch contributed by Ninaad Joshi
snagel
[nutch] branch master updated (d8754b7 -> 9931acc)
snagel
[nutch] 03/03: Merge branch 'NUTCH-2451' - cherry-picked e159ad4 from HiranChaudhuri:NUTCH-2451 - closes #241
snagel
[nutch] 02/03: NUTCH-2451 protocol-ftp to resolve relative URL when following redirects - return empty protocol output instead of throwing exception if relative redirect URL fails to resolve - format source code - complete LOG message
snagel
[nutch] 01/03: This suggested change seems to work. MalformedURLExceptions no longer occur.
snagel
[nutch] branch 2.x updated: NUTCH-2451 protocol-ftp to resolve relative URL when following redirects
snagel
[nutch] branch master updated: NUTCH-2468 should filter out invalid URLs by default - enable plugin urlfilter-validate by default
snagel
[nutch] branch 2.x updated: NUTCH-2468 should filter out invalid URLs by default - enable plugin urlfilter-validate by default
snagel
[nutch] branch master updated (55c7f75 -> 3f0ecdf)
snagel
[nutch] 05/05: NUTCH-2456 Allow to index pages/URLs not contained in CrawlDb
snagel
[nutch] 02/05: Code style fixes.
snagel
[nutch] 03/05: Allow index removals even if dbDatum is null.
snagel
[nutch] 04/05: Fix for previous commit
snagel
[nutch] 01/05: NUTCH-2456: Redirected documents are not indexed
snagel
[nutch] branch master updated (2465e63 -> 55c7f75)
snagel
[nutch] 01/01: Merge pull request #244 from jorgelbg/NUTCH-2464
snagel
[nutch] branch master updated (d3aa453 -> 2465e63)
snagel
[nutch] 01/01: Merge pull request #251 from okedoki/NUTCH-2465
snagel
[nutch] branch master updated (705686e -> d3aa453)
snagel
[nutch] 02/02: Merge branch 'YossiTamari/NUTCH-2463'
snagel
[nutch] 01/02: NUTCH-2458
snagel
[nutch] branch master updated: NUTCH-2458
markus
[nutch] branch master updated (6199492 -> 9e4d954)
snagel
[nutch] 01/01: Merge pull request #239 from Omkar20895/NUTCH-2442
snagel
[nutch] branch master updated: NUTCH-2420 Bug in variable generate.max.count and fetcher.server.delay
markus
[nutch] branch master updated: NUTCH-2452 Allow nutch to retrieve Ftp URLs that contain UrlEncoded characters, closes #237
snagel
[nutch] branch master updated (90360c8 -> bb2a7ad)
snagel
[nutch] 01/01: Merge pull request #230 from jorgelbg/NUTCH-2443
snagel
[nutch] branch master updated (f356790 -> 90360c8)
snagel
[nutch] 01/01: Merge pull request #234 from sebastian-nagel/NUTCH-2394
snagel
[nutch] branch master updated (bd8c847 -> f356790)
snagel
[nutch] 01/01: Merge pull request #211 from sebastian-nagel/NUTCH-1932
snagel
[nutch] branch master updated: NUTCH-2386 BasicURLNormalizer does not encode curly braces
markus
[nutch] branch 2.x updated: NUTCH-2448: Treat white-space http.agent.version as empty.
snagel
[nutch] branch master updated (0cdd095 -> 4cfec6e)
snagel
[nutch] 01/01: Merge pull request #232 from YossiTamari/http.agent.version
snagel
[nutch] branch master updated: NUTCH-2445 Fetcher following outlinks to keep track of already fetched items
markus
[nutch] branch master updated: NUTCH-2444 HostDB CSV dumper to emit field header by default
markus
[nutch] branch 2.x updated: Fix for NUTCH-2446 by Kenneth McFarland
snagel
[nutch] branch master updated (21d56a0 -> 602c663)
snagel
[nutch] 01/01: Merge pull request #231 from kpm1985/NUTCH-2446
snagel
[nutch] branch master updated: NUTCH-1763 Code comment Injector contributed by Diaa
snagel
[nutch] branch master updated (777e759 -> 6dab5dc)
snagel
[nutch] 01/03: NUTCH-2435 - New parameter "parser.store.text" allowing to choose whether to store 'parse_text' directory or not.
snagel
[nutch] 03/03: NUTCH-2435 New configuration allowing to choose whether to store 'parse_text' directory or not. Merge branch 'maborec-NUTCH-2435', closes #225
snagel
[nutch] 02/03: Apply eclipse-codeformat.xml format to NUTCH-2435 changes.
snagel
[nutch] branch 2.x updated (d95396e -> 16696af)
lewismc
[nutch] 01/01: Merge pull request #228 from tulay/NUTCH-2437
lewismc
[nutch] branch master updated (da64358 -> 777e759)
snagel
[nutch] 01/01: Merge pull request #224 from maborec/NUTCH-2433
snagel
[nutch] branch master updated (d06ccde -> da64358)
lewismc
[nutch] 01/01: Merge pull request #227 from kpm1985/NUTCH-2436
lewismc
[nutch] branch master updated (9446b1e -> d06ccde)
snagel
Earlier messages
Later messages