[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2017-07-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092964#comment-16092964 ] Hudson commented on NUTCH-1465: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3435 (See

[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

2017-07-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074513#comment-16074513 ] Hudson commented on NUTCH-2397: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3432 (See

[jira] [Commented] (NUTCH-2374) Upgrade Nutch 2.X to Gora 0.7

2017-07-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074989#comment-16074989 ] Hudson commented on NUTCH-2374: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1587 (See

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-07-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074987#comment-16074987 ] Hudson commented on NUTCH-2393: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1587 (See

[jira] [Commented] (NUTCH-2391) Spurious Duplications for MD5

2017-07-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074988#comment-16074988 ] Hudson commented on NUTCH-2391: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1587 (See

[jira] [Commented] (NUTCH-2391) Spurious Duplications for MD5

2017-07-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075010#comment-16075010 ] Hudson commented on NUTCH-2391: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3433 (See

[jira] [Commented] (NUTCH-2389) Precise data parsing using Jsoup CSS selectors

2017-07-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106551#comment-16106551 ] Hudson commented on NUTCH-2389: --- FAILURE: Integrated in Jenkins build Nutch-nutchgora #1588 (See

[jira] [Commented] (NUTCH-2405) jsoup-extractor structure correction, typo fixed

2017-08-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120339#comment-16120339 ] Hudson commented on NUTCH-2405: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1590 (See

[jira] [Commented] (NUTCH-2378) ChildFirst plugin classloader

2017-08-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16132763#comment-16132763 ] Hudson commented on NUTCH-2378: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1591 (See

[jira] [Commented] (NUTCH-2373) Indexer for Hbase

2017-05-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020250#comment-16020250 ] Hudson commented on NUTCH-2373: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1583 (See

[jira] [Commented] (NUTCH-2388) bin/crawl indexing only webpages containing batchID instead of all in 2.x

2017-05-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021527#comment-16021527 ] Hudson commented on NUTCH-2388: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1584 (See

[jira] [Commented] (NUTCH-2353) Create seed file with metadata using the REST API

2017-05-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017310#comment-16017310 ] Hudson commented on NUTCH-2353: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3429 (See

[jira] [Commented] (NUTCH-2376) Improve configurability of HTTP Accept* header fields

2017-05-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017309#comment-16017309 ] Hudson commented on NUTCH-2376: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3429 (See

[jira] [Commented] (NUTCH-2353) Create seed file with metadata using the REST API

2017-05-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016992#comment-16016992 ] Hudson commented on NUTCH-2353: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3428 (See

[jira] [Commented] (NUTCH-2374) Upgrade Nutch 2.X to Gora 0.7

2017-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051334#comment-16051334 ] Hudson commented on NUTCH-2374: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1586 (See

[jira] [Commented] (NUTCH-2409) Injector: complete command-line help and counters

2017-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161128#comment-16161128 ] Hudson commented on NUTCH-2409: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3453 (See

[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

2017-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161121#comment-16161121 ] Hudson commented on NUTCH-2397: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1592 (See

[jira] [Commented] (NUTCH-2430) Complete plugin build configuration

2017-09-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179201#comment-16179201 ] Hudson commented on NUTCH-2430: --- FAILURE: Integrated in Jenkins build Nutch-trunk #3454 (See

[jira] [Commented] (NUTCH-2436) Remove empty comment, and redundant semicolon from CommandRunner

2017-09-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184747#comment-16184747 ] Hudson commented on NUTCH-2436: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3456 (See

[jira] [Commented] (NUTCH-2433) Html Parser: keep htmltag where the outlinks are found

2017-09-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185784#comment-16185784 ] Hudson commented on NUTCH-2433: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3457 (See

[jira] [Commented] (NUTCH-2413) Parsing fetcher to respect property "parse.filter.urls"

2017-08-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142732#comment-16142732 ] Hudson commented on NUTCH-2413: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3452 (See

[jira] [Commented] (NUTCH-2437) gora mongodb mapping file error

2017-10-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191807#comment-16191807 ] Hudson commented on NUTCH-2437: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1593 (See

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214828#comment-16214828 ] Hudson commented on NUTCH-2446: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3459 (See

[jira] [Commented] (NUTCH-2446) URLFiltersCheck fix

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214825#comment-16214825 ] Hudson commented on NUTCH-2446: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1594 (See

[jira] [Commented] (NUTCH-2444) HostDB CSV dumper to emit field header by default

2017-10-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215161#comment-16215161 ] Hudson commented on NUTCH-2444: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3460 (See

[jira] [Commented] (NUTCH-2463) Enable sampling CrawlDB

2017-11-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268625#comment-16268625 ] Hudson commented on NUTCH-2463: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3469 (See

[jira] [Commented] (NUTCH-2458) TikaParser doesn't work with tika-config.xml set

2017-11-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268626#comment-16268626 ] Hudson commented on NUTCH-2458: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3469 (See

[jira] [Commented] (NUTCH-2465) Broken Eclipse project. Classpaths and interactiveselenium should be fixed.

2017-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273292#comment-16273292 ] Hudson commented on NUTCH-2465: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3470 (See

[jira] [Commented] (NUTCH-2464) Plugin headings: Headers That Contain HTML Elements Are Not Parsed

2017-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273291#comment-16273291 ] Hudson commented on NUTCH-2464: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3470 (See

[jira] [Commented] (NUTCH-2456) Allow to index pages/URLs not contained in CrawlDb

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278316#comment-16278316 ] Hudson commented on NUTCH-2456: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3471 (See

[jira] [Commented] (NUTCH-2468) should filter out invalid URLs by default

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278378#comment-16278378 ] Hudson commented on NUTCH-2468: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1596 (See

[jira] [Commented] (NUTCH-2468) should filter out invalid URLs by default

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278393#comment-16278393 ] Hudson commented on NUTCH-2468: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3472 (See

[jira] [Commented] (NUTCH-2370) FileDumper: save JSON mapping file -> URL

2017-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294208#comment-16294208 ] Hudson commented on NUTCH-2370: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3486 (See

[jira] [Commented] (NUTCH-2034) CrawlDB filtered documents counter.

2017-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294209#comment-16294209 ] Hudson commented on NUTCH-2034: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3486 (See

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294112#comment-16294112 ] Hudson commented on NUTCH-2478: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3483 (See

[jira] [Commented] (NUTCH-2322) URL not available for Jexl operations

2017-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294186#comment-16294186 ] Hudson commented on NUTCH-2322: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3485 (See

[jira] [Commented] (NUTCH-2358) HostInjectorJob doesn't work

2017-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294244#comment-16294244 ] Hudson commented on NUTCH-2358: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1600 (See

[jira] [Commented] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294159#comment-16294159 ] Hudson commented on NUTCH-2477: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3484 (See

[jira] [Commented] (NUTCH-2474) CrawlDbReader -stats fails with ClassCastException

2017-12-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291049#comment-16291049 ] Hudson commented on NUTCH-2474: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3477 (See

[jira] [Commented] (NUTCH-2034) CrawlDB filtered documents counter.

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295251#comment-16295251 ] Hudson commented on NUTCH-2034: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2216) db.ignore.*.links to optionally follow internal redirects

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295252#comment-16295252 ] Hudson commented on NUTCH-2216: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295248#comment-16295248 ] Hudson commented on NUTCH-2478: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2322) URL not available for Jexl operations

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295250#comment-16295250 ] Hudson commented on NUTCH-2322: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2362) Upgrade MaxMind GeoIP version in index-geoip

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295246#comment-16295246 ] Hudson commented on NUTCH-2362: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2380) indexer-elastic version upgrade to 5.3.0

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295253#comment-16295253 ] Hudson commented on NUTCH-2380: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2295) Nutch master docker container broken

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295240#comment-16295240 ] Hudson commented on NUTCH-2295: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2480) Upgrade crawler-commons dependency to 0.9

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295244#comment-16295244 ] Hudson commented on NUTCH-2480: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295249#comment-16295249 ] Hudson commented on NUTCH-2477: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2035) Regex filter using case sensitive rules.

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295247#comment-16295247 ] Hudson commented on NUTCH-2035: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2370) FileDumper: save JSON mapping file -> URL

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295241#comment-16295241 ] Hudson commented on NUTCH-2370: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2365) HTTP Redirects to SubDomains don't get crawled if db.ignore.external.links.mode == byDomain

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295239#comment-16295239 ] Hudson commented on NUTCH-2365: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2474) CrawlDbReader -stats fails with ClassCastException

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295242#comment-16295242 ] Hudson commented on NUTCH-2474: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2354) Upgrade Hadoop dependencies to 2.7.4

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295245#comment-16295245 ] Hudson commented on NUTCH-2354: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.17

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295243#comment-16295243 ] Hudson commented on NUTCH-2439: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3487 (See

[jira] [Commented] (NUTCH-2483) Remove/replace indirect dependencies to org.json

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295332#comment-16295332 ] Hudson commented on NUTCH-2483: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3488 (See

[jira] [Commented] (NUTCH-2353) Create seed file with metadata using the REST API

2017-12-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295333#comment-16295333 ] Hudson commented on NUTCH-2353: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3488 (See

[jira] [Commented] (NUTCH-2486) Compiler Warning: Unchecked / unsafe operations in MimeTypeIndexingFilter

2017-12-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296994#comment-16296994 ] Hudson commented on NUTCH-2486: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3489 (See

[jira] [Commented] (NUTCH-2362) Upgrade MaxMind GeoIP version in index-geoip

2017-12-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293312#comment-16293312 ] Hudson commented on NUTCH-2362: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3481 (See

[jira] [Commented] (NUTCH-2035) Regex filter using case sensitive rules.

2017-12-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292898#comment-16292898 ] Hudson commented on NUTCH-2035: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1599 (See

[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.17

2017-12-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292924#comment-16292924 ] Hudson commented on NUTCH-2439: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3479 (See

[jira] [Commented] (NUTCH-2035) Regex filter using case sensitive rules.

2017-12-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292802#comment-16292802 ] Hudson commented on NUTCH-2035: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3478 (See

[jira] [Commented] (NUTCH-2480) Upgrade crawler-commons dependency to 0.9

2017-12-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293115#comment-16293115 ] Hudson commented on NUTCH-2480: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3480 (See

[jira] [Commented] (NUTCH-2354) Upgrade Hadoop dependencies to 2.7.4

2017-12-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293116#comment-16293116 ] Hudson commented on NUTCH-2354: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3480 (See

[jira] [Commented] (NUTCH-2438) Upgrade Nutch 2.X to Gora 0.8

2017-12-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289976#comment-16289976 ] Hudson commented on NUTCH-2438: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1598 (See

[jira] [Commented] (NUTCH-2458) TikaParser doesn't work with tika-config.xml set

2017-11-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247521#comment-16247521 ] Hudson commented on NUTCH-2458: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3468 (See

[jira] [Commented] (NUTCH-2452) Problem retrieving encoded URLs via FTP?

2017-11-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239741#comment-16239741 ] Hudson commented on NUTCH-2452: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3465 (See

[jira] [Commented] (NUTCH-2443) Extract links from the video tag with the parse-html plugin

2017-11-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239740#comment-16239740 ] Hudson commented on NUTCH-2443: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3465 (See

[jira] [Commented] (NUTCH-2420) Bug in variable generate.max.count and fetcher.server.delay

2017-11-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240651#comment-16240651 ] Hudson commented on NUTCH-2420: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3466 (See

[jira] [Commented] (NUTCH-2442) Injector to stop if job fails to avoid loss of CrawlDb

2017-11-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240970#comment-16240970 ] Hudson commented on NUTCH-2442: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3467 (See

[jira] [Commented] (NUTCH-2451) protocol-ftp to resolve relative URL when following redirects

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278447#comment-16278447 ] Hudson commented on NUTCH-2451: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3473 (See

[jira] [Commented] (NUTCH-2470) CrawlDbReader -stats to show quantiles of score

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278496#comment-16278496 ] Hudson commented on NUTCH-2470: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3474 (See

[jira] [Commented] (NUTCH-2469) Documents not commited to solr in Sever mode

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278445#comment-16278445 ] Hudson commented on NUTCH-2469: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1597 (See

[jira] [Commented] (NUTCH-2451) protocol-ftp to resolve relative URL when following redirects

2017-12-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278444#comment-16278444 ] Hudson commented on NUTCH-2451: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1597 (See

[jira] [Commented] (NUTCH-2399) indexer-elastic does not index multi-value fields (only the first value is indexed)

2017-12-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280135#comment-16280135 ] Hudson commented on NUTCH-2399: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3475 (See

[jira] [Commented] (NUTCH-2394) Possible bugs in the source code

2017-10-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219095#comment-16219095 ] Hudson commented on NUTCH-2394: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3464 (See

[jira] [Commented] (NUTCH-2448) Allow Sending an empty http.agent.version

2017-10-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217545#comment-16217545 ] Hudson commented on NUTCH-2448: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3462 (See

[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

2018-05-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471213#comment-16471213 ] Hudson commented on NUTCH-2575: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3524 (See

[jira] [Commented] (NUTCH-2513) ant eclipse target fails with "protocol switch unsafe"

2018-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467297#comment-16467297 ] Hudson commented on NUTCH-2513: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1608 (See

[jira] [Commented] (NUTCH-2513) ant eclipse target fails with "protocol switch unsafe"

2018-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467315#comment-16467315 ] Hudson commented on NUTCH-2513: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3523 (See

[jira] [Commented] (NUTCH-2500) Add pull-reqest template to github

2018-05-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488933#comment-16488933 ] Hudson commented on NUTCH-2500: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3526 (See

[jira] [Commented] (NUTCH-2500) Add pull-reqest template to github

2018-05-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488915#comment-16488915 ] Hudson commented on NUTCH-2500: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1609 (See

[jira] [Commented] (NUTCH-2577) protocol-selenium can't handle https

2018-05-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487631#comment-16487631 ] Hudson commented on NUTCH-2577: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3525 (See

[jira] [Commented] (NUTCH-2595) Upgrade crawler-commons dependency to 0.10

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509785#comment-16509785 ] Hudson commented on NUTCH-2595: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3533 (See

[jira] [Commented] (NUTCH-2576) HTTP protocol plugin based on okhttp

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509786#comment-16509786 ] Hudson commented on NUTCH-2576: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3533 (See

[jira] [Commented] (NUTCH-2581) Caching of redirected robots.txt may overwrite correct robots.txt rules

2018-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505905#comment-16505905 ] Hudson commented on NUTCH-2581: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1611 (See

[jira] [Commented] (NUTCH-2581) Caching of redirected robots.txt may overwrite correct robots.txt rules

2018-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505914#comment-16505914 ] Hudson commented on NUTCH-2581: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3532 (See

[jira] [Commented] (NUTCH-2530) Rename property db.max.anchor.length > linkdb.max.anchor.length

2018-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505915#comment-16505915 ] Hudson commented on NUTCH-2530: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3532 (See

[jira] [Commented] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception

2018-06-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505176#comment-16505176 ] Hudson commented on NUTCH-2505: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3531 (See

[jira] [Commented] (NUTCH-2012) Merge parsechecker and indexchecker

2018-06-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510742#comment-16510742 ] Hudson commented on NUTCH-2012: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3535 (See

[jira] [Commented] (NUTCH-2579) Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)

2018-06-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510805#comment-16510805 ] Hudson commented on NUTCH-2579: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3536 (See

[jira] [Commented] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-06-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510806#comment-16510806 ] Hudson commented on NUTCH-2578: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3536 (See

[jira] [Commented] (NUTCH-2574) Generator: hostCount >= maxCount comparison wrong

2018-06-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510807#comment-16510807 ] Hudson commented on NUTCH-2574: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3536 (See

[jira] [Commented] (NUTCH-2040) Upgrade to recent version of Crawler-Commons

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509848#comment-16509848 ] Hudson commented on NUTCH-2040: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1612 (See

[jira] [Commented] (NUTCH-2549) protocol-http does not behave the same as browsers

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509864#comment-16509864 ] Hudson commented on NUTCH-2549: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

[jira] [Commented] (NUTCH-2559) protocol-http cannot handle colons after the HTTP status code

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509868#comment-16509868 ] Hudson commented on NUTCH-2559: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

[jira] [Commented] (NUTCH-2563) HTTP header spellchecking issues

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509870#comment-16509870 ] Hudson commented on NUTCH-2563: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

[jira] [Commented] (NUTCH-2558) protocol-http cannot handle a missing HTTP status line

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509869#comment-16509869 ] Hudson commented on NUTCH-2558: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

[jira] [Commented] (NUTCH-2564) protocol-http throws an error when the content-length header is not a number

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509867#comment-16509867 ] Hudson commented on NUTCH-2564: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

[jira] [Commented] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509871#comment-16509871 ] Hudson commented on NUTCH-2557: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

[jira] [Commented] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509872#comment-16509872 ] Hudson commented on NUTCH-2560: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See

<    4   5   6   7   8   9   10   11   12   13   >