Anton,
I think there are at least a couple of ways of doing this. I assume you
have a program that does sentence detection already, as Lucene does not
provide this. If not, I am sure a search of the web will find one that
has high accuracy.
You can:
1. Index each sentence as a separate Document. You will need a field on
the Document relating it back to the overall file so you can reconstruct it.
2. As you index, insert sentence/paragraph boundary markers into your
index and then use the SpanQuery functionality. Search this mail
archive for sentence boundary detection and Span Query (try the dev list
too). I think there was a discussion between me, Doug and Hoss on how
to do this.
3. Do search as you do now and then post process to figure out what
sentence it came from. This will be inefficient, but I don't know what
your requirements are that way, so it may work for you.
There are probably other ways too.
anton feldmann wrote:
I intend, to make a search, to find a word or a word pair
in a sentence or a paragraph. But then the sentence should be indicated
as a whole. The question relates to the fact, that I need to extend
Lucene
in such a way that this is possible. But where to I make a start, because
I have no idea, how I have to change the IndexFile, whether that
conforms with the Lucene Team.
cheers
anton feldmann
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]