This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 961c725  NUTCH-2034 CrawlDB update job to count documents in CrawlDb 
rejected by URL filters (patch contributed by Luis Lopez)
     add 62f6d9f  Add a new IndexingFilter that uses JEXL to decide whether to 
index a document.
     add 36bfac1  Some improvements based on revewier's feedback.
     add d72591a  Better tests.
     add c7c795a  Merge branch 'master' of https://github.com/apache/nutch into 
index-jexl-filter
     add a985e30  Fixed per reviewers' comments. Changed the package name to be 
more specific, added package-info.java, added to more build targets.
     add bea8621  doclint does not like self-closing tags.
     new 34236ff  fix for NUTCH-2370 contributed by [email protected]
     new d758a31  NUTCH-2474 CrawlDbReader -stats fails with ClassCastException 
- replace CrawlDbStatCombiner by CrawlDbStatReducer and ensure   that data is 
properly processed independently whether and   how often combiner is called - 
simplify calculation of minimum and maximum
     new 26669eb  - filter out NaN scores which break the quantile calculation
     new 194fc37  Extend indexer-elastic-rest to support languages
     new 153525c  fix formatting
     new 5ccebc9  add languages to default config
     new 9fcc2a4  fix delete
     new 42bdc65  NUTCH-2439 Upgrade Apache Tika dependency to 1.17
     new 2be2052  Add tika-config.xml to suppress Tika warnings on stderr
     new e0326de  make fully configurable
     new e7b077e  NUTCH-2480 Upgrade crawler-commons dependency to 0.9
     new 52a1c50  fix indentation
     new 67dc52c  scope variables
     new 416c457  NUTCH-2354 Upgrade Hadoop dependencies to 2.7.4
     new e7d5c13  NUTCH-2362 Upgrade MaxMind GeoIP version in index-geoip
     new e0e06f5  NUTCH-2035 urlfilter-regex case insensitive rules
     new 35193c2  NUTCH-2478 HTML parser should resolve base URL <base 
href=...> - fix parse-html and parse-tika - add unit test for parse-html
     new 8f692d1  NUTCH-2478 HTML parser should resolve base URL <base 
href=...> - finally fix parse-tika:   - href attribute of base element dropped 
in DOM   - need to call tikamd.get("Content-Location") - port HTML parser test 
from parse-html to parse-tika - add method to DomUtil which prints 
DocumentFragment
     new 4da6b19  fix for NUTCH-2477 (refactor checker classes) contributed by 
Jurian Broertjes
     new 9fb5777  Improve command-line help for URL filter and normalizer 
checker
     new 22fc7f0  NUTCH-2322 URL not available for Jexl operations - apply 
patch contributed by Markus Jelsma
     new e0a27c7  NUTCH-2034 CrawlDB update job to count documents in CrawlDb 
rejected by URL filters (patch contributed by Luis Lopez)
     new fc89e4f  NUTCH-2415 Create a JEXL based IndexingFilter Merge branch 
'pipldev-index-jexl-filter', closes #219

The 23 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 build.xml                                          |   4 +
 conf/nutch-default.xml                             |  18 +++
 default.properties                                 |   1 +
 src/plugin/build.xml                               |   2 +
 .../{headings => index-jexl-filter}/build.xml      |   6 +-
 .../ivy.xml                                        |   0
 .../plugin.xml                                     |  14 +--
 .../nutch/indexer/jexl/JexlIndexingFilter.java     | 131 +++++++++++++++++++++
 .../apache/nutch/indexer/jexl}/package-info.java   |  16 ++-
 .../nutch/indexer/jexl/TestJexlIndexingFilter.java | 124 +++++++++++++++++++
 10 files changed, 301 insertions(+), 15 deletions(-)
 copy src/plugin/{headings => index-jexl-filter}/build.xml (88%)
 copy src/plugin/{urlnormalizer-slash => index-jexl-filter}/ivy.xml (100%)
 copy src/plugin/{mimetype-filter => index-jexl-filter}/plugin.xml (74%)
 create mode 100644 
src/plugin/index-jexl-filter/src/java/org/apache/nutch/indexer/jexl/JexlIndexingFilter.java
 copy 
src/plugin/{scoring-similarity/src/java/org/apache/nutch/scoring/similarity/util
 => index-jexl-filter/src/java/org/apache/nutch/indexer/jexl}/package-info.java 
(51%)
 create mode 100644 
src/plugin/index-jexl-filter/src/test/org/apache/nutch/indexer/jexl/TestJexlIndexingFilter.java

-- 
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].

Reply via email to