Hello, To be more precise, the basic entity I am using is a document, each with paragraphs which may be up to few thousands. I need the proximity search within a paragraph, yet, I want to get as a search result the paragraph number also. Maybe, defining each paragraph as separate field it the best way What do you think? Thanks in advance
Reuven Ivgi -----Original Message----- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 03, 2006 10:58 AM To: java-dev@lucene.apache.org Subject: Re: Define end-of-paragraph Reuven Ivgi wrote on 10/02/2006 09:32 PM: > I want to divide a document to paragraphs, still having proximity search > within each paragraph > > How can I do that? > Is your issue that you want the paragraphs to be in a single document, but you want to limit proximity search to find matches only within a single paragraph? If so, you could parse your document into paragraphs and when generating tokens for it place large gaps at the paragraph boundaries. Each Token in lucene has a startOffset and endOffset that you can set as you generate Tokens inside TokenStream.next() for the TokenStream returned by your Analyzer. Those classes and methods are all in org.apache.lucene.analysis. Or alternatively, you could make each paragraph a separate field value and use Analyzer.getPositionIncrementGap() to achieve essentially the same thing (except that your Documents could get unwieldy if you that have many paragraphs). If this is not what you are trying to do, then please explain your objectives precisely. Good luck, Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]