[
https://issues.apache.org/jira/browse/LUCENE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Schoenmakers updated LUCENE-5697:
----------------------------------------
Description:
In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of
DocFetcher has investigated and foud the problem seems to be in Lucene. I do
not know if this bug has been fixed in a later Lucene version.
Issue:
We use "proximity search": search on multiple words in a directory with about
300 PDF files.
E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words
distance of each other. The resulting documents are correct. But the highligted
text in the document is often missing.
If the words are in the SAME order as in the search AND on the SAME page, then
the higlight works correct. But if the order of the words is different from the
search (like "wordA wordC wordB" OR the words are not on the same page, then
that text is not highlighted.
As we use the proximity search on multiple words often, it severely
degrades the usability.
was:
In DocFetcher, which uses Lucene, we stumbled on a bug. The lead of DocFetcher
has investigated and foud the problem seems to be in Lucene.
Issue: we use "proximity search": search on multiple words in a directory with
about 300 PDF files.
E.g. search for "wordA wordB wordC"~50, so three words within 50 words distance
of each other. The resulting documents are correct. But the highligted text in
the document is often missing.
If the words are in the SAME order as in the search AND on the SAME page, then
the higlight works correct. But if the order of the words is different from the
search (like "wordA wordC wordB" OR the words are not on the same page, then
that text is not highlighted.
As we use the proximity search on multiple words often, it severely
degrades the usability.
> Preview issue
> -------------
>
> Key: LUCENE-5697
> URL: https://issues.apache.org/jira/browse/LUCENE-5697
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Environment: DocFetcher 1.1.11 on Win 7(64) pro
> Reporter: Martin Schoenmakers
>
> In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of
> DocFetcher has investigated and foud the problem seems to be in Lucene. I do
> not know if this bug has been fixed in a later Lucene version.
> Issue:
> We use "proximity search": search on multiple words in a directory with about
> 300 PDF files.
> E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words
> distance of each other. The resulting documents are correct. But the
> highligted text in the document is often missing.
> If the words are in the SAME order as in the search AND on the SAME page,
> then the higlight works correct. But if the order of the words is different
> from the search (like "wordA wordC wordB" OR the words are not on the same
> page, then that text is not highlighted.
> As we use the proximity search on multiple words often, it severely
> degrades the usability.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]