Sure ... The frequency count is maintained in the index to enable relevance scoring. You can pull it out using a TermDocs, which enumerates this sort of information. Sorry, I don't have example code handy for this.

-Mike


On 1/1/2013 4:24 PM, Itai Peleg wrote:
That worked great :) thanks a lot for the quick reply!

I have another question - after I "flagged" all my special tokens (in my
case, the ones that are entities) is there an elegant way of counting how
many of them I have in a document? I found an ugly way to do that, but I'm
sure there's a better one.

Thanks in advance,
Itai


2012/12/31 Michael Sokolov <soko...@ifactory.com>

On 12/31/2012 11:39 AM, Itai Peleg wrote:

Hi all,

Can someone please post a simple example showing how to add additional
attributes to token in a TokenStream (inside IncrementToken for example?).

I'm working on entity extraction and want to flag specific tokens an
entities, but I'm having problems.

Thanks in advance,
Itai

  Here's a simple example of a filter that adds an atytribute saying
whether a token is "the"

class YourTokenStream extends TokenFilter {
   private final YourAttribute att;
   private final CharTermAttribute term;
   private final TokenStream source;

   public YourTokenStream (TokenStream upstream) {
      att = addAttribute (YourAttribute.class);
      term = addAttribute (CharTermAttribute.class);
      source = upstream;
   }

   public boolean incrementToken () {
     if (source.incrementToken()) ?? {
       if ("the".equals (new String(term.buffer())) {
         att.setIsAnEnglishArticle(**true);
         return true;
     }
     return false;
   }

}





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to