Hi, We have indexed a set of web files (jsp , js , xslt , java properties and html) using the lucene Whitespace Analyzer. The purpose is to allow developers to find where code / functions are used and defined across a large and dissperate content management repository. Hopefully to aid code re-use, easier refactoring and standards control.
However when a query parser search is made using a whitespace analyser with a string known to be in an indexed file, the search returns zero hits. For example the string <jsp\:include page =\"/path1/path2/path3/path4/file1.jsp\" /> is searched for using the query parser (escaping the meta-chars)and an indexed document which contains the following text should be found ? // include HTML head %> <jsp:include page="/path1/path2/path3/path4/file1.jsp" /> <script language="JavaScript" src ="/path1/path2/path3/file1.js"></script> <!-- <script> I've taken a look at the FAQ advice regarding checking the effects of an analyser (in our case whitespace) but our test class returns the expected tokens for any given token stream. For Example this string "<% mytoken1 mytoken2 %>" is tokenised by the whitespace analyzer as [<%] [mytoken1] [mytoken2] [%>]. I'm sure I've missed something but i can't see what it is. If anyone could shed any light on posible reasons for why we are getting zero hits for text strings which are in our indexed files I'd be really gratefull. See below for more info on index and search set up Thanks a lot Lee C File contents are in a tokenised , indexed not stored field. Index uses the whitespace analyzer which comes with lucene Searches are performed using a boolean query. The boolean query is made up of a query parser which gets its search term from an html text box entered by the user and a prefix query which is used to limit search scope by directory paths. the search uses a whitespace analyzer, no filtering takes place ------------------------------------------------------------------------------------------------- Get the best from British Airways at ba.com http://www.ba.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]