This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from da64358  Merge pull request #227 from kpm1985/NUTCH-2436
     add 7db1173  NUTCH-2433 / Html Parser: keep htmltag where the outlinks are 
found
     add ca59744  New configuration parameter: 
'parser.html.outlinks.htmlnode_metadata_name' set empty value as default.
     add bfd47db  Small adjustment: Keep a reference to the last outlink to set 
metadata.
     add 3067753  Apply new parameter "parser.html.outlinks.ignore_tags" to the 
tika parser, as well. Some extra [eclipse-codeformat.xml] formatting changes 
applied as well.
     new 777e759  Merge pull request #224 from maborec/NUTCH-2433

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/nutch-default.xml                             |  7 ++++
 .../apache/nutch/parse/html/DOMContentUtils.java   | 24 ++++++++++++--
 .../apache/nutch/parse/tika/DOMContentUtils.java   | 37 +++++++++++++++++-----
 3 files changed, 58 insertions(+), 10 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].

Reply via email to