RE: test case - RE: Slash Problem

Spencer, Dave Mon, 25 Nov 2002 19:44:09 -0800

Good point though I thought the rule was you were supposed
to use the same Analyzer on your Query as you built the
index with.


Of course I suspect that this will break down if the
Field.Keyword text has spaces in it.

But: it gets past all reasonable uri/url/filename cases so thanks.


-----Original Message-----
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
Sent: Monday, November 25, 2002 7:23 PM
To: Lucene Users List
Subject: Re: test case - RE: Slash Problem


Maybe there is a good reason for using WhitespaceAnalyzer in
TestQueryParser.java :).  Try it.

    public void testEscaped() throws Exception {
        Analyzer a = new WhitespaceAnalyzer();
        assertQueryEquals("\\[brackets", a, "\\[brackets");
        assertQueryEquals("\\[brackets", null, "brackets");
        assertQueryEquals("\\\\", a, "\\\\");
        assertQueryEquals("\\+blah", a, "\\+blah");
        assertQueryEquals("\\(blah", a, "\\(blah");
    }

Otis

--- "Spencer, Dave" <[EMAIL PROTECTED]> wrote:
> 
> I'm sure there's something that I'm missing here.
> Let's say we have an index of a web site with 2 fields,
> "body", and "url".
> Body is formed via Field.Text(...,Reader) and the url field by 
> Field.Keyword(), thus the URL is not tokenized but is searchable.
> 
> I use StandardAnalyzer and I want to find
> the Document with a matching URL, and I want
> to use QueryParser to parse the users queries.
> 
> I'm using v1.2.
> 
> It seems that, if I'm correct, one design problem is that the
> Analyzer 
> does not have a reference to an index, so it doesn't know
> if a field has been tokenized. It probably should not tokenize
> queries against an untokenized field. AFAIAK the queries against
> untokenized fields are always tokenized and there is no way to tell
> the QueryParser to not tokenize a field.
> 
> I have attached a test program that shows the behavior and
> sample output.
> The "From:" lines are user queries.
> The "To:" lines are the result of calling QueryParser and then
> Query.toString().
> 
> The 3rd and 4th From/To lines below are the key ones.
> The goal is to enter a query like url:http://wwww.tropo.com/
> or url:"http://www.tropo.com/"; and not tokenize the
> 'http://www.tropo.com/'.
> I tried backslashes too to no avail (url:http\://www.tropo.com/)
> 
>       
> 
>
========================================================================
> ==
> C:\proj\tropo_java>java com.tropo.lucene.KeywordProblem
> From: foo
> To  : foo
> 
> From: body:foo
> To  : body:foo
> 
> From: url:http://www.tropo.com/                        <-- first
> attempt
> To  : http                                             <-- first
> problem, ok, we gotta quote
> 
> From: url:"http://www.tropo.com/";                      <-- second
> attempt
> To  : "http www.tropo.com"                             <-- second
> problem, colon and slashes missing
> 
> 
>
========================================================================
> ==
> package com.tropo.lucene;
> 
> import java.io.*;
> import java.util.*;
> 
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.analysis.standard.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.queryParser.*;
> 
> public class KeywordProblem
> {
>       /**
>        *
>        */
>       public static void main(String[] args)
>               throws Throwable
>       {
>               String body = "body";
>               String url = "url";
> 
>               String[] lines = new String[] {
>                       "foo",
>                       "body:foo",
>                       "url:http://www.tropo.com/";,
>                       "url:\"http://www.tropo.com/\"";
>               };
> 
>               Analyzer a = new StandardAnalyzer();
>               for ( int i = 0; i < lines.length; i++)
>               {
>                       Query query = QueryParser.parse( lines[i], url,
> a);
>                       o.println( "From: " + lines[i]);
>                       o.println( "To  : " + query.toString( url));
>                       o.println();
>               }
>       }
>       private static PrintStream o = System.out;
> }
> 
> 
> 
> 
> -----Original Message-----
> From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, November 25, 2002 12:13 PM
> To: Lucene Users List
> Subject: Re: Slash Problem
> 
> 
> Dave,
> 
> My recent testing suggests that when the field is not tokenized, it
> is
> not
> split as you suggest.  When I search the "path" field using
> "path:1102/A*" I
> get precisely what I am looking for (though I discovered the
> lowercase
> mechanism isn't applied to this field and the query is case-sensitive
> -
> not
> the uppercase 'A' above.)
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Spencer, Dave" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Monday, November 25, 2002 2:58 PM
> Subject: RE: Slash Problem
> 
> 
> Funny, I have more or less the same question I've been meaning to
> post.
> I think the answer is going to be that the analyzer applies to all
> parts
> of
> a query, even to untokenized fields, which to me seems wrong.
> 
> So I think if you have a query like
> 
> body:foo uri:"/alpha/beta"
> 
> With 'body' being tokenized and 'uri' not tokenized, I think that
> the analyzer applies to "/alpha/beta" and breaks it into "alpha beta"
> which is not desired...
> 
> 
> -----Original Message-----
> From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, November 25, 2002 9:26 AM
> To: Lucene Users List
> Subject: Re: Slash Problem
> 
> 
> Rob,
> 
> I presume that means that you used backslashes (in the url) rather
> than
> forward slashes (in the path).  I had planned to test that as a
> workaround
> and it's good to know that you've already tested that successfully.
> 
> But why is this necessary?  Why doesn't the escape ('\') allow the
> use
> of a
> backslash?
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Rob Outar" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Monday, November 25, 2002 12:01 PM
> Subject: RE: Slash Problem
> 
> 
> > I don't know if this helps but I had exact same problem, I then
> stored
> the
> > URI instead of the path, I was then able to search on the URI.
> >
> > Thanks,
> >
> > Rob
> >
> >
> > -----Original Message-----
> > From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, November 25, 2002 11:53 AM
> > To: Lucene Users Group
> > Subject: Slash Problem
> >
> >
> > I've got a Text field (tokenized, indexed, stored) called 'path'
> which
> > contains a string in the form of '1102\A3345-12RT.XML'.  When I
> submit
> a
> > query like "path:1102*" it works fine.  But, when I try to be more
> specific
> > (such as "path:1102\a*" or "path:1102*a*") it fails.  I've tried
> escaping
> > the slash ("path:1102\\a*") but that also fails.
> >
> > I'm using the StandardAnalyzer and the default QueryParser.  Could
> anyone
> > suggest what's going wrong here?
> >
> > Regards,
> >
> > Terry
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 
> 
> 
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: test case - RE: Slash Problem

Reply via email to