Maybe there is a good reason for using WhitespaceAnalyzer in
TestQueryParser.java :). Try it.
public void testEscaped() throws Exception {
Analyzer a = new WhitespaceAnalyzer();
assertQueryEquals("\\[brackets", a, "\\[brackets");
assertQueryEquals("\\[brackets", null, "brackets");
assertQueryEquals("\\\\", a, "\\\\");
assertQueryEquals("\\+blah", a, "\\+blah");
assertQueryEquals("\\(blah", a, "\\(blah");
}
Otis
--- "Spencer, Dave" <[EMAIL PROTECTED]> wrote:
>
> I'm sure there's something that I'm missing here.
> Let's say we have an index of a web site with 2 fields,
> "body", and "url".
> Body is formed via Field.Text(...,Reader) and the url field by
> Field.Keyword(), thus the URL is not tokenized but is searchable.
>
> I use StandardAnalyzer and I want to find
> the Document with a matching URL, and I want
> to use QueryParser to parse the users queries.
>
> I'm using v1.2.
>
> It seems that, if I'm correct, one design problem is that the
> Analyzer
> does not have a reference to an index, so it doesn't know
> if a field has been tokenized. It probably should not tokenize
> queries against an untokenized field. AFAIAK the queries against
> untokenized fields are always tokenized and there is no way to tell
> the QueryParser to not tokenize a field.
>
> I have attached a test program that shows the behavior and
> sample output.
> The "From:" lines are user queries.
> The "To:" lines are the result of calling QueryParser and then
> Query.toString().
>
> The 3rd and 4th From/To lines below are the key ones.
> The goal is to enter a query like url:http://wwww.tropo.com/
> or url:"http://www.tropo.com/" and not tokenize the
> 'http://www.tropo.com/'.
> I tried backslashes too to no avail (url:http\://www.tropo.com/)
>
>
>
>
========================================================================
> ==
> C:\proj\tropo_java>java com.tropo.lucene.KeywordProblem
> From: foo
> To : foo
>
> From: body:foo
> To : body:foo
>
> From: url:http://www.tropo.com/ <-- first
> attempt
> To : http <-- first
> problem, ok, we gotta quote
>
> From: url:"http://www.tropo.com/" <-- second
> attempt
> To : "http www.tropo.com" <-- second
> problem, colon and slashes missing
>
>
>
========================================================================
> ==
> package com.tropo.lucene;
>
> import java.io.*;
> import java.util.*;
>
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.analysis.standard.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.queryParser.*;
>
> public class KeywordProblem
> {
> /**
> *
> */
> public static void main(String[] args)
> throws Throwable
> {
> String body = "body";
> String url = "url";
>
> String[] lines = new String[] {
> "foo",
> "body:foo",
> "url:http://www.tropo.com/",
> "url:\"http://www.tropo.com/\""
> };
>
> Analyzer a = new StandardAnalyzer();
> for ( int i = 0; i < lines.length; i++)
> {
> Query query = QueryParser.parse( lines[i], url,
> a);
> o.println( "From: " + lines[i]);
> o.println( "To : " + query.toString( url));
> o.println();
> }
> }
> private static PrintStream o = System.out;
> }
>
>
>
>
> -----Original Message-----
> From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, November 25, 2002 12:13 PM
> To: Lucene Users List
> Subject: Re: Slash Problem
>
>
> Dave,
>
> My recent testing suggests that when the field is not tokenized, it
> is
> not
> split as you suggest. When I search the "path" field using
> "path:1102/A*" I
> get precisely what I am looking for (though I discovered the
> lowercase
> mechanism isn't applied to this field and the query is case-sensitive
> -
> not
> the uppercase 'A' above.)
>
> Regards,
>
> Terry
>
> ----- Original Message -----
> From: "Spencer, Dave" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Monday, November 25, 2002 2:58 PM
> Subject: RE: Slash Problem
>
>
> Funny, I have more or less the same question I've been meaning to
> post.
> I think the answer is going to be that the analyzer applies to all
> parts
> of
> a query, even to untokenized fields, which to me seems wrong.
>
> So I think if you have a query like
>
> body:foo uri:"/alpha/beta"
>
> With 'body' being tokenized and 'uri' not tokenized, I think that
> the analyzer applies to "/alpha/beta" and breaks it into "alpha beta"
> which is not desired...
>
>
> -----Original Message-----
> From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, November 25, 2002 9:26 AM
> To: Lucene Users List
> Subject: Re: Slash Problem
>
>
> Rob,
>
> I presume that means that you used backslashes (in the url) rather
> than
> forward slashes (in the path). I had planned to test that as a
> workaround
> and it's good to know that you've already tested that successfully.
>
> But why is this necessary? Why doesn't the escape ('\') allow the
> use
> of a
> backslash?
>
> Regards,
>
> Terry
>
> ----- Original Message -----
> From: "Rob Outar" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Monday, November 25, 2002 12:01 PM
> Subject: RE: Slash Problem
>
>
> > I don't know if this helps but I had exact same problem, I then
> stored
> the
> > URI instead of the path, I was then able to search on the URI.
> >
> > Thanks,
> >
> > Rob
> >
> >
> > -----Original Message-----
> > From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, November 25, 2002 11:53 AM
> > To: Lucene Users Group
> > Subject: Slash Problem
> >
> >
> > I've got a Text field (tokenized, indexed, stored) called 'path'
> which
> > contains a string in the form of '1102\A3345-12RT.XML'. When I
> submit
> a
> > query like "path:1102*" it works fine. But, when I try to be more
> specific
> > (such as "path:1102\a*" or "path:1102*a*") it fails. I've tried
> escaping
> > the slash ("path:1102\\a*") but that also fails.
> >
> > I'm using the StandardAnalyzer and the default QueryParser. Could
> anyone
> > suggest what's going wrong here?
> >
> > Regards,
> >
> > Terry
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>