----- Original Message ----- From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, October 15, 2003 10:01 AM Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
> On Saturday, October 11, 2003, at 09:44 AM, Michael Giles wrote: > > He is probably using the StandardAnalyzer. I was about to write the > > exact same email (but using Wal-Mart as an example on this page - > > http://www.benchmark.com/cgi-bin/suid/~bcmlp/ > > newsletter.cgi?mode=show&year=2003&date=2003-10-07). I index and > > search with the same analyzer (Standard), but when I search for > > Wal-Mart, I don't find a match. I DO find a match if I search for > > "Wal-Mart" or Wal Mart (no hyphen). This seems like a bug. > > Sorry for the delay. I've been meaning to reply to this. > > When you index using StandardAnalyzer, you are indexing it to two terms > "wal" and "mart" (without the quotes). QueryParser does its own > (weird?) stuff to > strings passed to it. Here's how it breaks down: > > String[] queries = {"Wal-Mart", "\"Wal-Mart\"", "Wal Mart"}; > for (int i = 0; i < queries.length; i++) { > String query = queries[i]; > Query q = QueryParser.parse(query, "contents", new > StandardAnalyzer()); > System.out.println(query + " = " + q); > } > > Wal-Mart = contents:wal -contents:mart > "Wal-Mart" = contents:"wal mart" > Wal Mart = contents:wal contents:mart > > Notice all three are completely different queries. The Wal-Mart one is > excluding "mart" making it miss documents you expect. The second one > is a phrase query, which is basically what you're after. The third one > is matching any documents with "wal" or "mart" in them regardless of > whether they are side-by-side. > > Is this a bug? Nah... just the nature of the QueryParser beast. It > would be a non-backwards-compatible change to change how QueryParser > deals with a dash. That is the main issue here with it interpreting it > as a NOT operator. But it seems logical to me that it shouldn't do so > when its mashed against a word like this and leave it to the analyzer > to deal with. I believe this is the same problem that I had the other day. If you search the mailing list for "t-shirt" you should get some threads discussing this problem. In fact why don't give it here: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&by=thread&from=317960 Cheers, victor --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]