[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851719#action_12851719
]
Hudson commented on NUTCH-779:
--
Integrated in Nutch-trunk #1112 (See
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850915#action_12850915
]
Julien Nioche commented on NUTCH-779:
-
Could anyone please review this issue? I would
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850939#action_12850939
]
Andrzej Bialecki commented on NUTCH-779:
-
CrawlDbReducer, the cramped line {{if
I'd like to use Julien's approach because I found the scoring filter complex
to understand.
My use case is the following :
1. during scoring after parsing, I want to tag interesting pages for me, say
meta=HIT
2. in the next step (to be created) I would like to prune the segment of
NON-HIT content
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802172#action_12802172
]
Julien Nioche commented on NUTCH-779:
-
The property needs some documentation in
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802175#action_12802175
]
Andrzej Bialecki commented on NUTCH-779:
-
Personally I would use ScoringFilters
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801875#action_12801875
]
Andrzej Bialecki commented on NUTCH-779:
-
You can already achieve this with