You need to let us know the analyzer you are using. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!
On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman <ami...@gmail.com>wrote: > > >> Hi >> >> I have created a RTFHandler which takes a RTF file and creates a lucene >> Document which is indexed. The RTFHandler looks like something like this: >> >> if (bodyText != null) { >> Document document = new Document(); >> Field field = new >> Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, >> Field.Index.ANALYZED); >> document.add(field); >> >> >> } >> >> I am using Java Built in RTF text extraction. When I run my test to >> verify that the document contains text that I expect this works fine. I get >> the following when I print the document: >> >> Document<stored/uncompressed,indexed,tokenized<body:This is a test rtf >> document that will be indexed. >> >> Amin Mohammed-Coleman> >> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf> >> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf> >> stored/uncompressed,indexed<type:RTF_INDEXER> >> stored/uncompressed,indexed<summary:This is a >> >> >> >> The problem is when I use the following to search I get no result: >> >> MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] >> {rtfIndexSearcher}); >> Term t = new Term("body", "Amin"); >> TermQuery termQuery = new TermQuery(t); >> TopDocs topDocs = multiSearcher.search(termQuery, >> 1); >> System.out.println(topDocs.totalHits); >> multiSearcher.close(); >> >> RftIndexSearcher is configured with the directory that holds rtf >> documents. I have used Luke to look at the document and what I am finding >> in the overview tab is the following for the document: >> >> 1 body test >> 1 id 1234 >> 1 name rtfDocumentToIndex.rtf >> 1 path rtfDocumentToIndex.rtf >> 1 summary This is a >> 1 type RTF_INDEXER >> 1 body rtf >> >> >> However on the Document tab I am getting (in the body field): >> >> This is a test rtf document that will be indexed. >> >> Amin Mohammed-Coleman >> >> >> I would expect to get a hit using "Amin" or even "document". I am not >> sure whether the >> line: >> TopDocs topDocs = multiSearcher.search(termQuery, 1); >> >> is incorrect as I am not too sure of the meaning of "Finds the top n hits >> for query." for search (Query query, int n) according to java docs. >> >> I would be grateful if someone may be able to advise on what I may be >> doing wrong. I am using Lucene 2.4.0 >> >> >> Cheers >> Amin >> >> >> >> >> > >