First, _you_ define a "paragraph". It's one of those tricky concepts that's totally obvious to a human but is surprisingly hard to implement in code. What's a paragraph in Chinese? Hebrew? Even in English it's tricky.. How does a PDF signal a paragraph? Is that consistent with Word? Open Office? How about an HTML page? The <p> tag isn't consistently used....
So no, Lucene doesn't have any knowledge of paragraph, there's nothing built in to even try to detect such an abstract concept. As Ahmet suggests, there are tools out there you can try that will attempt to detect where paragraphs are in your documents. >From there, I'd suggest that you index paragraphs with a large position offset for the first word of each one, then you can search for phrases with a "slop" less than that gap. Best, Erick On Mon, Sep 12, 2016 at 7:25 AM, szzoli <reg9sz...@freemail.hu> wrote: > Hi, > > thanks for the hint. > > My question exatly is: > > Can I use a paragraph of a document to use as a term to search in the index? > Does Lucene create an inde only on word level, or can it be set to index on > phrase, or paragraph level? Is it the question of indexing or of searching > to search for several words? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Is-it-possible-to-search-for-a-paragraph-in-Lucene-tp4295705p4295779.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org