On Feb 17, 2004, at 9:58 AM, [EMAIL PROTECTED] wrote:
On Tuesday 17 February 2004 15:18, Erik Hatcher wrote:
You would do them separately.  I'm not clear on what you are trying to
do.  The Analyzer does all this during indexing automatically for you,
but it sounds like you are just trying to emulate what an Analyzer
already does to extract words from text?

I am still doing this:


TokenStream in = analyzer.tokenStream("contents", new
StringReader(reader.document(i).getField("contents").stringValue()));

And I want to extract all words from all Fields.


The "words" (or "terms") are already in the index ready to be read very rapidly and accurately. IndexReader is what you want to investigate if your fields are indexed.

Look into IndexReader and pull the terms directly rather than re-"analyzing" the text. Provided "contents" was an indexed field, you could do something like this (taken from a mini-project I'm tinkering with right now):

  public String[] wordsThatStartWith(char c) throws IOException {
    String letter = new String("" + c).toLowerCase();

    ArrayList words = new ArrayList();
    if (reader == null) {
      reader = IndexReader.open(indexPath);
    }

    TermEnum terms = reader.terms(new Term("word", letter));
    while ("word".equals(terms.term().field())) {

      String word = terms.term().text();
      if (word.startsWith(letter)) {
        words.add(word);
      } else {
        break;
      }

      if (!terms.next()) {
        break;
      }
    }

Collections.sort(words);

String[] sortedWords = (String[]) words.toArray(new String[0]);

    return sortedWords;
  }

You'll need to do some adapting of this code to your environment and field(s), as what is here is designed to pull all the "word"'s that start with a specified letter.

Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to