[
https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223104#comment-15223104
]
ASF GitHub Bot commented on NUTCH-2245:
---------------------------------------
Github user sujen1412 commented on a diff in the pull request:
https://github.com/apache/nutch/pull/101#discussion_r58303167
--- Diff:
src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/Model.java
---
@@ -115,6 +126,7 @@ public static DocVector createDocVector(String content)
{
tStream.reset();
while(tStream.incrementToken()) {
String term = charTermAttribute.toString();
+ LOG.info(term);
--- End diff --
This seems like its used for debugging, please change it to LOG.debug(). It
helps keeping the log clean.
Thanks!
> Developed the NGram Model on the existing Unigram Cosine Similarity Model
> -------------------------------------------------------------------------
>
> Key: NUTCH-2245
> URL: https://issues.apache.org/jira/browse/NUTCH-2245
> Project: Nutch
> Issue Type: New Feature
> Components: plugin, scoring
> Reporter: Bhavya Sanghavi
> Assignee: Sujen Shah
> Priority: Minor
> Labels: memex
>
> Built on the existing unigram cosine similarity model by adding the Ngram
> model, thus providing flexibility to the user to choose the window size for
> scoring the similarity between webpages and the gold standard.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)