Unfortunately I spoke too soon. While the original example seems to have been fixed, I'm still getting some unexpected results.
As per your suggestion, I modified the Analyzer to: @Override protected TokenStreamComponents createComponents(String field, Reader in) { NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add("/", " "); // Transform all forward slashes into whitespace Reader mappingFilter = new MappingCharFilter(builder.build(), in); Tokenizer tokenizer = new WhitespaceTokenizer(version, mappingFilter); return new TokenStreamComponents(tokenizer); } When I try this: QueryParser parser = new QueryParser(Version.LUCENE_48, "f", new MyAnalyzer(Version.LUCENE_48)); System.err.println(parser.parse(QueryParser.escape("one/two"))); I get f:one f:two as expected. However, if I change the text to "hello one/two", I get: f:hello f:one/two I can't figure out what's going on. My custom tokenizer seems to work well, but I'd rather use Lucene's built-ins. Thank you, Luis On Wed, Jun 18, 2014 at 3:38 PM, Luis Pureza <pur...@gmail.com> wrote: > Thanks, that did work. > > > > On Tue, Jun 17, 2014 at 8:49 PM, Jack Krupansky <j...@basetechnology.com> > wrote: > >> Yeah, this is kind of tricky and confusing! Here's what happens: >> >> 1. The query parser "parses" the input string into individual source >> terms, each delimited by white space. The escape is removed in this >> process, but... no analyzer has been called at this stage. >> >> 2. The query parser (generator) calls the analyzer for each source term. >> Your analyzer is called at this stage, but... the escape is already gone, >> so... the <backslash><slash> mapping rule is not triggered, leaving the >> slash recorded in the source term from step 1. >> >> You do need the backslash in your original query because a slash >> introduces a regex query term. It is added by the escape method you call, >> but the escaping will be gone by the time your analyzer is called. >> >> So, just try a simple, unescaped slash in your char mapping table. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Luis Pureza >> Sent: Tuesday, June 17, 2014 1:43 PM >> To: java-user@lucene.apache.org >> Subject: Lucene QueryParser/Analyzer inconsistency >> >> >> Hi, >> >> I'm experience a puzzling behaviour with the QueryParser and was hoping >> someone around here can help me. >> >> I have a very simple Analyzer that tries to replace forward slashes (/) by >> spaces. Because QueryParser forces me to escape strings with slashes >> before >> parsing, I added a MappingCharFilter to the analyzer that replaces "\/" >> with a single space. The analyzer is defined as follows: >> >> @Override >> protected TokenStreamComponents createComponents(String field, Reader in) >> { >> NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); >> builder.add("\\/", " "); >> Reader mappingFilter = new MappingCharFilter(builder.build(), in); >> >> Tokenizer tokenizer = new WhitespaceTokenizer(version, mappingFilter); >> return new TokenStreamComponents(tokenizer); >> } >> >> Then I use this analyzer in the QueryParser to parse a string with dashes: >> >> String text = QueryParser.escape("one/two"); >> QueryParser parser = new QueryParser(Version.LUCENE_48, "f", new >> MyAnalyzer(Version.LUCENE_48)); >> System.err.println(parser.parse(text)); >> >> The expected output would be >> >> f:one f:two >> >> However, I get: >> >> f:one/two >> >> The puzzling thing is that when I debug the analyzer, it tokenizes the >> input string correctly, returning two tokens instead of one. >> >> What is going on? >> >> Many thanks, >> >> Luís Pureza >> >> P.S.: I was able to fix this issue temporarily by creating my own >> tokenizer >> that tokenizes on whitespace and slashes. However, I still don't >> understand >> what's going on. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >