Good point though I thought the rule was you were supposed to use the same Analyzer on your Query as you built the index with.
Of course I suspect that this will break down if the Field.Keyword text has spaces in it. But: it gets past all reasonable uri/url/filename cases so thanks. -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 7:23 PM To: Lucene Users List Subject: Re: test case - RE: Slash Problem Maybe there is a good reason for using WhitespaceAnalyzer in TestQueryParser.java :). Try it. public void testEscaped() throws Exception { Analyzer a = new WhitespaceAnalyzer(); assertQueryEquals("\\[brackets", a, "\\[brackets"); assertQueryEquals("\\[brackets", null, "brackets"); assertQueryEquals("\\\\", a, "\\\\"); assertQueryEquals("\\+blah", a, "\\+blah"); assertQueryEquals("\\(blah", a, "\\(blah"); } Otis --- "Spencer, Dave" <[EMAIL PROTECTED]> wrote: > > I'm sure there's something that I'm missing here. > Let's say we have an index of a web site with 2 fields, > "body", and "url". > Body is formed via Field.Text(...,Reader) and the url field by > Field.Keyword(), thus the URL is not tokenized but is searchable. > > I use StandardAnalyzer and I want to find > the Document with a matching URL, and I want > to use QueryParser to parse the users queries. > > I'm using v1.2. > > It seems that, if I'm correct, one design problem is that the > Analyzer > does not have a reference to an index, so it doesn't know > if a field has been tokenized. It probably should not tokenize > queries against an untokenized field. AFAIAK the queries against > untokenized fields are always tokenized and there is no way to tell > the QueryParser to not tokenize a field. > > I have attached a test program that shows the behavior and > sample output. > The "From:" lines are user queries. > The "To:" lines are the result of calling QueryParser and then > Query.toString(). > > The 3rd and 4th From/To lines below are the key ones. > The goal is to enter a query like url:http://wwww.tropo.com/ > or url:"http://www.tropo.com/" and not tokenize the > 'http://www.tropo.com/'. > I tried backslashes too to no avail (url:http\://www.tropo.com/) > > > > ======================================================================== > == > C:\proj\tropo_java>java com.tropo.lucene.KeywordProblem > From: foo > To : foo > > From: body:foo > To : body:foo > > From: url:http://www.tropo.com/ <-- first > attempt > To : http <-- first > problem, ok, we gotta quote > > From: url:"http://www.tropo.com/" <-- second > attempt > To : "http www.tropo.com" <-- second > problem, colon and slashes missing > > > ======================================================================== > == > package com.tropo.lucene; > > import java.io.*; > import java.util.*; > > import org.apache.lucene.analysis.*; > import org.apache.lucene.analysis.standard.*; > import org.apache.lucene.search.*; > import org.apache.lucene.queryParser.*; > > public class KeywordProblem > { > /** > * > */ > public static void main(String[] args) > throws Throwable > { > String body = "body"; > String url = "url"; > > String[] lines = new String[] { > "foo", > "body:foo", > "url:http://www.tropo.com/", > "url:\"http://www.tropo.com/\"" > }; > > Analyzer a = new StandardAnalyzer(); > for ( int i = 0; i < lines.length; i++) > { > Query query = QueryParser.parse( lines[i], url, > a); > o.println( "From: " + lines[i]); > o.println( "To : " + query.toString( url)); > o.println(); > } > } > private static PrintStream o = System.out; > } > > > > > -----Original Message----- > From: Terry Steichen [mailto:[EMAIL PROTECTED]] > Sent: Monday, November 25, 2002 12:13 PM > To: Lucene Users List > Subject: Re: Slash Problem > > > Dave, > > My recent testing suggests that when the field is not tokenized, it > is > not > split as you suggest. When I search the "path" field using > "path:1102/A*" I > get precisely what I am looking for (though I discovered the > lowercase > mechanism isn't applied to this field and the query is case-sensitive > - > not > the uppercase 'A' above.) > > Regards, > > Terry > > ----- Original Message ----- > From: "Spencer, Dave" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, November 25, 2002 2:58 PM > Subject: RE: Slash Problem > > > Funny, I have more or less the same question I've been meaning to > post. > I think the answer is going to be that the analyzer applies to all > parts > of > a query, even to untokenized fields, which to me seems wrong. > > So I think if you have a query like > > body:foo uri:"/alpha/beta" > > With 'body' being tokenized and 'uri' not tokenized, I think that > the analyzer applies to "/alpha/beta" and breaks it into "alpha beta" > which is not desired... > > > -----Original Message----- > From: Terry Steichen [mailto:[EMAIL PROTECTED]] > Sent: Monday, November 25, 2002 9:26 AM > To: Lucene Users List > Subject: Re: Slash Problem > > > Rob, > > I presume that means that you used backslashes (in the url) rather > than > forward slashes (in the path). I had planned to test that as a > workaround > and it's good to know that you've already tested that successfully. > > But why is this necessary? Why doesn't the escape ('\') allow the > use > of a > backslash? > > Regards, > > Terry > > ----- Original Message ----- > From: "Rob Outar" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, November 25, 2002 12:01 PM > Subject: RE: Slash Problem > > > > I don't know if this helps but I had exact same problem, I then > stored > the > > URI instead of the path, I was then able to search on the URI. > > > > Thanks, > > > > Rob > > > > > > -----Original Message----- > > From: Terry Steichen [mailto:[EMAIL PROTECTED]] > > Sent: Monday, November 25, 2002 11:53 AM > > To: Lucene Users Group > > Subject: Slash Problem > > > > > > I've got a Text field (tokenized, indexed, stored) called 'path' > which > > contains a string in the form of '1102\A3345-12RT.XML'. When I > submit > a > > query like "path:1102*" it works fine. But, when I try to be more > specific > > (such as "path:1102\a*" or "path:1102*a*") it fails. I've tried > escaping > > the slash ("path:1102\\a*") but that also fails. > > > > I'm using the StandardAnalyzer and the default QueryParser. Could > anyone > > suggest what's going wrong here? > > > > Regards, > > > > Terry > > > > > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
