Re: Search Problem

Amin Mohammed-Coleman Fri, 02 Jan 2009 15:13:43 -0800

Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what confused.For example when I view the index using the overview tab of Luke I getthe following:


1       body    test
1       id      1234
1       name    rtfDocumentToIndex.rtf
1       path    rtfDocumentToIndex.rtf
1       summary This is a
1       type    RTF_INDEXER
1       body    rtf

However when I view the document in the Document tab I get the fulltext that was extracted from the rft document (field:body) which is:


This is a test rtf document that will be indexed.
Amin Mohammed-Coleman

I am using the StandardAnaylzer therefore I wouldnt expect the wordsdocument, indexed, Amin Mohammed-Coleman to be removed.

I have referenced the Lucene In Action book and I can't see what I maybe doing wrong. I would be happy to provide a testcase should it berequired. When adding the body field to the document I am doing:


        Document document = new Document();

Field field = new Field(FieldNameEnum.BODY.getDescription(),bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);

                        document.add(field);

When I run the search code the string "test" is the only word thatreturns a result (TopDocs), whereas the others do not (e.g. "amin","document", "indexed").


Thanks again for your help and advice.


Cheers
Amin



On 2 Jan 2009, at 21:20, Erick Erickson wrote:

Casing is usually handled by the analyzer. Since you construct
the term query programmatically, it doesn't go through
any analyzers, thus is not converted into lower case for
searching as was done automatically for you when you
indexed using StandardAnalyzer.

As for why you aren't getting hits, it's unclear to me. But
what I'd do is get a copy of Luke and examine your index
to see what's *really* there. This will often give you clues,
usually pointing to some kind of analyzer behavior that you
weren't expecting.

Best
Erick

On Fri, Jan 2, 2009 at 6:39 AM, Amin Mohammed-Coleman <ami...@gmail.com>wrote:

Hi

I have tried this and it doesn't work. I don't understand whyusing "amin"

instead of "Amin" would work, is it not case insensitive?

I tried "test" for field "body" and this works. Any other termsdon't work

for example:

"document"
"indexed"

these are tokens that were extracted when creating the lucenedocument.



Thanks for your reply.

Cheers

Amin


On 2 Jan 2009, at 10:36, Chris Lu wrote:

Basically Lucene stores analyzed tokens, and looks up for the matches

based
on the tokens.
"Amin" after StandardAnalyzer is "amin", so you need to use new
Term("body",
"amin"), instead of new Term("body", "Amin"), to search.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

DBSight customer, a shopping comparison site, (anonymous perrequest) got

2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 11:30 PM, Amin Mohammed-Coleman <ami...@gmail.com

wrote:

Hi


Sorry I was using the StandardAnalyzer in this instance.

Cheers




On 2 Jan 2009, at 00:55, Chris Lu wrote:

You need to let us know the analyzer you are using.

-- Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:


http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

DBSight customer, a shopping comparison site, (anonymous perrequest)

got
2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman <ami...@gmail.com

wrote:

Hi
I have created a RTFHandler which takes a RTF file and creates a
lucene
Document which is indexed. The RTFHandler looks likesomething like
this:

if (bodyText != null) {
                  Document document = new Document();
                  Field field = new
Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),
Field.Store.YES,
Field.Index.ANALYZED);
                  document.add(field);


}
I am using Java Built in RTF text extraction. When I run mytest toverify that the document contains text that I expect thisworks fine.
I
get
the following when I print the document:
Document<stored/uncompressed,indexed,tokenized<body:This is atest rtf
document that will be indexed.

Amin Mohammed-Coleman>
stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf>
stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf>
stored/uncompressed,indexed<type:RTF_INDEXER>
stored/uncompressed,indexed<summary:This is a >>
The problem is when I use the following to search I get noresult:
MultiSearcher multiSearcher = new MultiSearcher(newSearchable[]
{rtfIndexSearcher});
                  Term t = new Term("body", "Amin");
                  TermQuery termQuery = new TermQuery(t);
TopDocs topDocs =multiSearcher.search(termQuery,
1);
                  System.out.println(topDocs.totalHits);
                  multiSearcher.close();

RftIndexSearcher is configured with the directory that holds rtf
documents. I have used Luke to look at the document and whatI am
finding
in the overview tab is the following for the document:

1       body    test
1       id      1234
1       name    rtfDocumentToIndex.rtf
1       path    rtfDocumentToIndex.rtf
1       summary This is a
1       type    RTF_INDEXER
1       body    rtf


However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman
I would expect to get a hit using "Amin" or even "document".I am not
sure whether the
line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);
is incorrect as I am not too sure of the meaning of "Finds thetop n
hits
for query." for search (Query query, int n) according to javadocs.
I would be grateful if someone may be able to advise on what Imay be
doing wrong.  I am using Lucene 2.4.0


Cheers
Amin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Search Problem

Reply via email to