Lucene will look for exact matches at its base. However, between the query
string and actually matching searches there is an analyzer that may
manipulate the query. You may have to create an devnagari(hindi) which
correctly tokenizes the terms.
Not that Lucene saves all terms in unicode and will compare them as
correctly has Java compares them.
One other problem I have had, and seen others do is not import the data into
a Java String correctly, so the analyzer and indexer never see the correct
terms.
One way I have used to debug this kind of problem is to look at the terms
the analyzer created and were added to the index. I do this with this hacked
up JSP page (see below).
I hope this helps.
--Peter
<%@page contentType="text/html"%>
<%@page import="org.apache.lucene.index.IndexReader"%>
<%@page import="org.apache.lucene.index.TermEnum"%>
<%@page import="org.apache.lucene.index.Term"%>
<html>
<head><title>View terms</title></head>
<body>
View Terms
<%
String indexPath = application.getRealPath("/")+"data/XMLIndex.idx";
IndexReader ir = IndexReader.open(indexPath);
out.println("Total docs = "+ir.numDocs());
out.println("<TABLE><TR><TH>term</TH><TH>freq</TH></TR>");
TermEnum te = ir.terms();
while (te.next()){
Term term = te.term();
int docFreq = te.docFreq();
if (term.field().compareTo("text")== 0 ||
term.field().compareTo("title") == 0) {
out.println("<TR><TD>"+term.field()+":"+term.text()+"</TD><TD>"+docFreq+"</T
D></TR>");
}
}
out.println("</TABLE>");
te.close();
ir.close();
%>
</body>
</html>
On 6/4/02 1:48 AM, "Harpreet S Walia" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> We are using lucene to index and search unicode(utf-8) contents in
> devnagari(hindi) language .
>
> What we have observed is that our query fetches results which have partial
> word match . i.e if it were english then a query "india" would relurn words
> like
> indian , southindia and so on.
>
> Is there a way by which we can instruct lucene to only search complete words
> and not word parts.
>
> TIA
>
> Regards
> harpreet
>
>
> --
> To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>
>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>