Conclusion: Escaping Does not Work

Terry Steichen Wed, 27 Nov 2002 05:39:40 -0800

Since you have to design and write a custom Analyzer to implement escape
characters, the references to escape characters should be removed from the
documentation (as they are not a feature, but something that you could add -
if you can figure out how).


Terry

----- Original Message -----
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, November 26, 2002 11:57 PM
Subject: Re: Does Escaping Really Work?


> I think all you have to do is write your own Analyzer.
> You can copy one of the supplied ones, and remove the piece that calls
> isLetter(char) or some similar function.  That may be in
> StandardTokenizer, I can't look at the code now to confirm.
> If you want to thread certain fields differently (e.g. exception to the
> rule) you can see an example of such an Analyzer in jGuru's Lucene FAQ.
>
> Good luck,
> Otis
>
> --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > Yes, Otis - that does help.  But a little more advice would help even
> > more.
> >
> > For example, I'm currently using the standard Lucene code without any
> > customization.  That means I am using StandardAnalyzer.  Internally,
> > what
> > StandardAnalyzer does is (1) create a StandardTokenizer, (2)
> > StandardFilter,
> > (3) LowerCaseFilter, and (4) StopFilter.  StandardTokenizer is
> > generated
> > from StandardTokenizer.jj, but when generated, it extends Tokenizer.
> >
> > Now WhitespaceAnalyzer (which you've mentioned several times) creates
> > a
> > WhitespaceTokenizer (which in turn extends CharTokenizer, which
> > extends
> > Tokenizer).
> >
> > This all makes me a bit dizzy, since I don't really understand (and
> > hope I
> > don't have to learn) all the internal Lucene architecture.  It would
> > help
> > enormously if you could tell me precisely I have to do to make the
> > escape
> > character work with all the functionality of StandardAnalyzer
> > retained.  The
> > WhitespaceAnalyzer - should it be used in lieu of the
> > StandardTokenizer?  If
> > so, would any functionality be lost?  (It seems like it would lose a
> > ton of
> > functionality to me.)  Would it be better to modify
> > StandardTokenizer.jj,
> > and if so, where/how?
> >
> > TIA,
> >
> > Terry
> >
> > ----- Original Message -----
> > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > Sent: Tuesday, November 26, 2002 6:45 PM
> > Subject: Re: Does Escaping Really Work?
> >
> >
> > > Documentation is not detailed enough.
> > > Analyzers analyze their input (at indexing and searching time).
> > > They are just Java classes that do not know about QueryParser.jj,
> > which
> > > is the only place where '\' is defined as an escape characters
> > (plus
> > > the .java files generated by running QueryParser.jj through
> > JavaCC).
> > > Hence, I believe that if your Analyzer is not explicitly instructed
> > to
> > > leave '\' alone you will think that escaping doesn't work.
> > > Whitespace analyzer I believe works because it doesn't throw out
> > > characters like '\', as I think it only splits token on spaces.
> > >
> > > HTH.
> > > Otis
> > >
> > >
> > > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > > Dave,
> > > >
> > > > I would say you seem to be right.  But this is getting very
> > > > frustrating.
> > > > Here is what the Lucene docs say:
> > > >
> > > > <docs quote>
> > > > Lucene supports escaping special characters that are part of the
> > > > query
> > > > syntax. The current list special characters are
> > > >
> > > > + - && || ! ( ) { } [ ] ^ " ~ * ? : \
> > > >
> > > > To escape these character use the \ before the character. For
> > example
> > > > to
> > > > search for (1+1):2 use the query:
> > > >
> > > >  \(1\+1\)\:2
> > > >
> > > > </docs quote>
> > > >
> > > > Is the Lucene documentation in error?  Does it work but only
> > using
> > > > something
> > > > other than the standard configuration?  If so, precisely what
> > > > non-standard
> > > > configuration is necessary?
> > > >
> > > > Why can't these questions be answered simply and clearly?
> > > >
> > > > Terry
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: "Spencer, Dave" <[EMAIL PROTECTED]>
> > > > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > > > Sent: Tuesday, November 26, 2002 5:02 PM
> > > > Subject: RE: Does Escaping Really Work?
> > > >
> > > >
> > > > My understanding is that "escaping may not work (as Terry and I
> > > > believe)
> > > > however
> > > >  a workaround for most 'reasonable' cases is to use
> > > > WhitespaceAnalyzer
> > > > when
> > > > parsing a query".
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> > > > Sent: Tuesday, November 26, 2002 1:48 PM
> > > > To: Lucene Users List
> > > > Subject: Re: Does Escaping Really Work?
> > > >
> > > >
> > > > Well, pardon me for breathing, Otis.
> > > >
> > > > I didn't make the connection (partly 'cause you changed the
> > subject
> > > > line).
> > > > But anyway, I don't understand your rather oblique answer - does
> > > > escaping
> > > > work or not?  Are you saying that, in order for it to work (the
> > way
> > > > the
> > > > docs
> > > > say it does), I need to insert this module in the chain? Or what?
> > > >
> > > > Terry
> > > >
> > > > ----- Original Message -----
> > > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > > > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > > > Sent: Tuesday, November 26, 2002 3:07 PM
> > > > Subject: Re: Does Escaping Really Work?
> > > >
> > > >
> > > > > Didn't I just answer this last night?
> > > > > WhitespaceAnalyzer?
> > > > >
> > > > > Otis
> > > > >
> > > > > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > > > > I'm confused about how to use escape characters in Lucene.
> > My
> > > > Lucene
> > > > > > configuration is 1.3-dev1 and I use the StandardAnalyzer and
> > > > > > QueryParser.
> > > > > >
> > > > > > My documents have a field called 'path' with a value like
> > > > > > "1102/a55407-2002nov2.xml".  This field is indexed but not
> > > > tokenized.
> > > > > >  Here are the various queries I've tried and their results:
> > > > > >
> > > > > > 1) When a dash is included in the query, Lucene interprets
> > this
> > > > as a
> > > > > > space. ("path:1102/a55402-2002nov2.xml" is interpreted as
> > > > > > "path:1102/a55402 -body:2002nov2.xml")
> > > > > >
> > > > > > 2) When a backslash is inserted before the dash (and the
> > query
> > > > does
> > > > > > *not* contain a wildcard), Lucene interprets this by
> > inserting a
> > > > > > space in lieu of the next character.
> > > > > > ('path:1102/a55402\-2002nov2.xml' interpreted as
> > > > 'path:"1102/a55402
> > > > > > 2002nov2.xml" [note the space where the dash was]')
> > > > > >
> > > > > > 3) When a backslash is inserted before the dash (and the
> > query
> > > > *does*
> > > > > > contain a wildcard), Lucene interprets this literally,
> > without
> > > > any
> > > > > > conversion. ("path:1102/55407\-2002nov*" is interpreted
> > > > literally).
> > > > > >
> > > > > > 4) When a backslash is inserted before the dash and
> > immediately
> > > > > > followed by a wildcard, Lucene reports an error.
> > > > > > ('path:1102/a55407-*'    causes lexical error: Encountered
> > <EOF>
> > > > > > after :"")
> > > > > >
> > > > > > My overall observation is that it appears it is not possible
> > to
> > > > > > escape a dash - is this true?
> > > > > >
> > > > > > A previous post (yesterday) suggests that it is also not
> > possible
> > > > to
> > > > > > escape a backslash.  If that's also true, what characters can
> > be
> > > > > > escaped?
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Terry
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > __________________________________________________
> > > > > Do you Yahoo!?
> > > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > > http://mailplus.yahoo.com
> > > > >
> > > > > --
> > > > > To unsubscribe, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > > > For additional commands, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > >
> > >
> > >
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > http://mailplus.yahoo.com
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Conclusion: Escaping Does *not* Work

Reply via email to

Conclusion: Escaping Does not Work