Hi John,
The source code is available from CVS, make it non-final and do what you need to do.
Of course, you may have a hard time finding help later if you aren't using something
everyone else is and your solution doesn't work... :-)
If I understand correctly what you are trying to do, you already know all of the
answers for indexing, you just want Lucene to do the retrieval side of the coin,
correct? I suppose a crazy idea might be to write a program that took your info and
output it in the Lucene file format, but that seems a bit like overkill.
-Grant
>>> [EMAIL PROTECTED] 07/07/04 07:37PM >>>
Hi Doug:
Thanks for the response!
The solution you proposed is still a derivative of creating a
dummy document stream. Taking the same example, java (5), lucene (6),
VectorTokenStream would create a total of 11 Tokens whereas only 2 is
neccessary.
Given many documents with many terms and frequencies, it would
create many extra Token instances.
The reason I was looking to derving the Field class is because I
can directly manipulate the FieldInfo by setting the frequency. But
the class is final...
Any other suggestions?
Thanks
-John
On Wed, 07 Jul 2004 14:20:24 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang wrote:
> > While lucene tokenizes the words in the document, it counts the
> > frequency and figures out the position, we are trying to bypass this
> > stage: For each document, I have a set of words with a know frequency,
> > e.g. java (5), lucene (6) etc. (I don't care about the position, so it
> > can always be 0.)
> >
> > What I can do now is to create a dummy document, e.g. "java java
> > java java java lucene lucene lucene lucene lucene" and pass it to
> > lucene.
> >
> > This seems hacky and cumbersome. Is there a better alternative? I
> > browsed around in the source code, but couldn't find anything.
>
> Write an analyzer that returns terms with the appropriate distribution.
>
> For example:
>
> public class VectorTokenStream extends TokenStream {
> private int term;
> private int freq;
> public VectorTokenStream(String[] terms, int[] freqs) {
> this.terms = terms;
> this.freqs = freqs;
> }
> public Token next() {
> if (freq == 0) {
> term++;
> if (term >= terms.length)
> return null;
> freq = freqs[term];
> }
> freq--;
> return new Token(terms[term], 0, 0);
> }
> }
>
> Document doc = new Document();
> doc.add(Field.Text("content", ""));
> indexWriter.addDocument(doc, new Analyzer() {
> public TokenStream tokenStream(String field, Reader reader) {
> return new VectorTokenStream(new String[] {"java","lucene"},
> new int[] {5,6});
> }
> });
>
> > Too bad the Field class is final, otherwise I can derive from it
> > and do something on that line...
>
> Extending Field would not help. That's why it's final.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]