This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from da64358 Merge pull request #227 from kpm1985/NUTCH-2436
add 7db1173 NUTCH-2433 / Html Parser: keep htmltag where the outlinks are
found
add ca59744 New configuration parameter:
'parser.html.outlinks.htmlnode_metadata_name' set empty value as default.
add bfd47db Small adjustment: Keep a reference to the last outlink to set
metadata.
add 3067753 Apply new parameter "parser.html.outlinks.ignore_tags" to the
tika parser, as well. Some extra [eclipse-codeformat.xml] formatting changes
applied as well.
new 777e759 Merge pull request #224 from maborec/NUTCH-2433
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
conf/nutch-default.xml | 7 ++++
.../apache/nutch/parse/html/DOMContentUtils.java | 24 ++++++++++++--
.../apache/nutch/parse/tika/DOMContentUtils.java | 37 +++++++++++++++++-----
3 files changed, 58 insertions(+), 10 deletions(-)
--
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].