The thing is - StandardAnalyzer breaks on hyphen. You'll need to work around
this by either extend StandardAnalyzer

>From StandardTokenizer's documentation (which is used by StandardAnalyzer):
*   <li> *Splits words at hyphens, unless there's a number in the token, in
which case
 *     the whole token is interpreted as a product number and is not split.*

I've investigated StandardAnalyzer's tokenization and it doesn't look simple
to disable that behavior. What you can do is extend StandardAnalyzer and
override its tokenStream method to create a TokenStream of your own. If you
know your text is space separated, you can use StringTokenizer to split the
text on spaces. If a token contains '-', don't break it, otherwise pass it
forward the the TokenStream returned by StandardAnalyzer.

Maybe someone else has a better answer, but if you insist on using
StandardAnalyzer, I have a feeling it will be problematic.

On Nov 22, 2007 6:02 PM, Shakti_Sareen < [EMAIL PROTECTED]> wrote:

> Hi
>
> But the file I am indexing is very big and I don't know which word will
> contain the hyphen. The thing you suggest can be implemented only if
> there are some specific words in the file.
>
> Apart from StandardAnalyzer I have got no option.
>
> Thanks a lot for your reply.
>
> Please suggest me how can I go ahead.
>
>
> SHAKTI SAREEN
> GE-GDC
> STC HYDERABAD
> 9948777794
>
> -----Original Message-----
> From: Shai Erera [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 22, 2007 9:25 PM
> To: java-user@lucene.apache.org
> Subject: Re: help required urgent!!!!!!!!!!!
>
> Hi
>
> You can simply create a PrefixQuery. However, if you're using
> StandardAnalyzer, and the word is added as Index.TOKENIZED,
> sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
> Therefore
> you'll need to add the word as Index.UN_TOKENIZED, or use a different
> Analyzer when you index the data (for this field at least).
>
> Here's a sample code:
>
>        // Indexing.
>        Document doc = new Document();
>        doc.add(new Field("field", "soft-wash", Store.NO,
> Index.UN_TOKENIZED
> ));
>
>        // Search
>        Query q = new PrefixQuery(new Term("field", "soft-wa"));
>
> Does that help?
>
> On Nov 22, 2007 5:46 PM, Shakti_Sareen < [EMAIL PROTECTED]> wrote:
>
> > Hi
> > I am using StandardAnalyser() to index the data.
> > But I want to do a like search on a word containing Hyphen
> > For example it want to search a word "soft-wa*"
> >
> > I am getting no hits for that. It is said that if the hyphen is there
> in
> > the word, then we should include that word in the double quotes (").
> But
> > enclosing the word in a double quotes (") means the exact word search.
> >
> > How can I perform the like search on a word containing hyphen???????
> >
> > Please help.
> >
> > Regards,
> > Shakti Sareen
> >
> >
> >
> >
> >
> > DISCLAIMER:
> > This email (including any attachments) is intended for the sole use of
> the
> > intended recipient/s and may contain material that is CONFIDENTIAL AND
> > PRIVATE COMPANY INFORMATION. Any review or reliance by others or
> copying or
> > distribution or forwarding of any or all of the contents in this
> message is
> > STRICTLY PROHIBITED. If you are not the intended recipient, please
> contact
> > the sender by email and delete all copies; your cooperation in this
> regard
> > is appreciated.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
> --
> Regards,
>
> Shai Erera
>
>
> DISCLAIMER:
> This email (including any attachments) is intended for the sole use of the
> intended recipient/s and may contain material that is CONFIDENTIAL AND
> PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or
> distribution or forwarding of any or all of the contents in this message is
> STRICTLY PROHIBITED. If you are not the intended recipient, please contact
> the sender by email and delete all copies; your cooperation in this regard
> is appreciated.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-- 
Regards,

Shai Erera

Reply via email to