I'm sure there's something that I'm missing here. Let's say we have an index of a web site with 2 fields, "body", and "url". Body is formed via Field.Text(...,Reader) and the url field by Field.Keyword(), thus the URL is not tokenized but is searchable.
I use StandardAnalyzer and I want to find the Document with a matching URL, and I want to use QueryParser to parse the users queries. I'm using v1.2. It seems that, if I'm correct, one design problem is that the Analyzer does not have a reference to an index, so it doesn't know if a field has been tokenized. It probably should not tokenize queries against an untokenized field. AFAIAK the queries against untokenized fields are always tokenized and there is no way to tell the QueryParser to not tokenize a field. I have attached a test program that shows the behavior and sample output. The "From:" lines are user queries. The "To:" lines are the result of calling QueryParser and then Query.toString(). The 3rd and 4th From/To lines below are the key ones. The goal is to enter a query like url:http://wwww.tropo.com/ or url:"http://www.tropo.com/" and not tokenize the 'http://www.tropo.com/'. I tried backslashes too to no avail (url:http\://www.tropo.com/) ======================================================================== == C:\proj\tropo_java>java com.tropo.lucene.KeywordProblem From: foo To : foo From: body:foo To : body:foo From: url:http://www.tropo.com/ <-- first attempt To : http <-- first problem, ok, we gotta quote From: url:"http://www.tropo.com/" <-- second attempt To : "http www.tropo.com" <-- second problem, colon and slashes missing ======================================================================== == package com.tropo.lucene; import java.io.*; import java.util.*; import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.*; import org.apache.lucene.search.*; import org.apache.lucene.queryParser.*; public class KeywordProblem { /** * */ public static void main(String[] args) throws Throwable { String body = "body"; String url = "url"; String[] lines = new String[] { "foo", "body:foo", "url:http://www.tropo.com/", "url:\"http://www.tropo.com/\"" }; Analyzer a = new StandardAnalyzer(); for ( int i = 0; i < lines.length; i++) { Query query = QueryParser.parse( lines[i], url, a); o.println( "From: " + lines[i]); o.println( "To : " + query.toString( url)); o.println(); } } private static PrintStream o = System.out; } -----Original Message----- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 12:13 PM To: Lucene Users List Subject: Re: Slash Problem Dave, My recent testing suggests that when the field is not tokenized, it is not split as you suggest. When I search the "path" field using "path:1102/A*" I get precisely what I am looking for (though I discovered the lowercase mechanism isn't applied to this field and the query is case-sensitive - not the uppercase 'A' above.) Regards, Terry ----- Original Message ----- From: "Spencer, Dave" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, November 25, 2002 2:58 PM Subject: RE: Slash Problem Funny, I have more or less the same question I've been meaning to post. I think the answer is going to be that the analyzer applies to all parts of a query, even to untokenized fields, which to me seems wrong. So I think if you have a query like body:foo uri:"/alpha/beta" With 'body' being tokenized and 'uri' not tokenized, I think that the analyzer applies to "/alpha/beta" and breaks it into "alpha beta" which is not desired... -----Original Message----- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 9:26 AM To: Lucene Users List Subject: Re: Slash Problem Rob, I presume that means that you used backslashes (in the url) rather than forward slashes (in the path). I had planned to test that as a workaround and it's good to know that you've already tested that successfully. But why is this necessary? Why doesn't the escape ('\') allow the use of a backslash? Regards, Terry ----- Original Message ----- From: "Rob Outar" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, November 25, 2002 12:01 PM Subject: RE: Slash Problem > I don't know if this helps but I had exact same problem, I then stored the > URI instead of the path, I was then able to search on the URI. > > Thanks, > > Rob > > > -----Original Message----- > From: Terry Steichen [mailto:[EMAIL PROTECTED]] > Sent: Monday, November 25, 2002 11:53 AM > To: Lucene Users Group > Subject: Slash Problem > > > I've got a Text field (tokenized, indexed, stored) called 'path' which > contains a string in the form of '1102\A3345-12RT.XML'. When I submit a > query like "path:1102*" it works fine. But, when I try to be more specific > (such as "path:1102\a*" or "path:1102*a*") it fails. I've tried escaping > the slash ("path:1102\\a*") but that also fails. > > I'm using the StandardAnalyzer and the default QueryParser. Could anyone > suggest what's going wrong here? > > Regards, > > Terry > > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
