Re: Whitespace Analyzer not producing expected search results

lee . a . carroll Wed, 17 Nov 2004 07:36:50 -0800


Thanks for the suggestions Erik. Displaying the query string is really
usefull
and this is what i've found.


I issue a search using the search term

ResponseHelper.writeNoCachingHeaders\(response\);

The search is parsed using a query parser and produces the following query
string

+contents:ResponseHelper.writeNoCachingHeaders(response);

This looks good and finds two documents

I then try a search using the term

ResponseHelper.writeNoCachingHeaders\(*\);

now I'm expecting this to be a wider search term and it should find at
least two, possibly more docs?

the query parser produces the query

+contents:responsehelper.writenocachingheaders(*);

wow the query has lost its case and no docs get returned.

Why does the query parser do this (my analyzer is the provided whitespace
one).

Any ideas to get around this ?

Thanks Lee C


Try using a TermQuery instead of QueryParser to see if you get the
results you expect.  Exact case matters.

Also, when troubleshooting issues with QueryParser, it is helpful to
see what the actual Query returned is - try displaying its toString
output.

 Erik

On Nov 16, 2004, at 6:25 AM, [EMAIL PROTECTED] wrote:

> Hi,
>
> We have indexed a set of web files (jsp , js , xslt , java properties
> and
> html) using the lucene Whitespace Analyzer.
> The purpose is to allow developers to find where code / functions are
> used
> and defined across a large and dissperate
> content management repository. Hopefully to aid code re-use, easier
> refactoring and standards control.
>
> However when a query parser search is made using a whitespace analyser
> with
> a string known to be in an indexed file, the search returns zero hits.
>
> For example the string  <jsp\:include page
> =\"/path1/path2/path3/path4/file1.jsp\" /> is
> searched for using the query parser (escaping the meta-chars)and an
> indexed
> document which contains
> the following text should be found ?
>
>  // include HTML head
> %>
>              <jsp:include page="/path1/path2/path3/path4/file1.jsp" />
>
>              <script language="JavaScript" src
> ="/path1/path2/path3/file1.js"></script>
>              <!-- <script>
>
>  I've taken a look at the FAQ advice regarding checking the effects of
> an
> analyser (in our case whitespace) but our test class returns the
> expected
> tokens for any given token stream. For Example this string  "<%
> mytoken1
> mytoken2 %>" is tokenised by the whitespace analyzer as [<%] [mytoken1]
> [mytoken2] [%>].
>
> I'm sure I've missed something but i can't see what it is. If anyone
> could
> shed any light on posible reasons for why we are getting zero hits for
> text
> strings which are in our indexed files I'd be really gratefull. See
> below
> for more info on index and search set up
>
> Thanks a lot Lee C
>
> File contents are  in a tokenised , indexed not stored field.
> Index uses the whitespace analyzer which comes with lucene
>
> Searches are performed using a boolean query. The boolean query is
> made up
> of a query parser which gets its search term from an html text box
> entered
> by the user and a prefix query which is used to limit search scope by
> directory paths.
> the search uses a whitespace analyzer, no filtering takes place

-------------------------------------------------------------------------------------------------

Get the best from British Airways at ba.com
http://www.ba.com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Whitespace Analyzer not producing expected search results

Reply via email to