Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm not seeing StandardAnalyzer.jj in the Lucene source download.
Mark Miller-5 wrote: > > Philip Brown wrote: >> Hi, >> >> After running some tests using the StandardAnalyzer, and getting 0 >> results >> from the search, I believe I need a special Tokenizer/Analyzer. Does >> anybody have something that parses like the following: >> >> - doesn't parse apart phrases (in quotes) >> - doesn't parse/separate hyphentated or underscored words >> other normal stuff like >> - parses on whitespace >> - removes periods in acronyms >> - lowercases everything (even in quotes? -- maybe) >> >> I basically have a set of terms, some of which are multi-worded phrases, >> but >> none should ever be broken apart -- not when adding the documents, not >> when >> querying the search results, etc. I'm creating the field in the >> documents >> as UN_TOKENIZED and using a StandardAnalyzer and basic Query object to >> get >> the results. Any suggestions and/or existing code that I could re-use to >> fit this purpose? >> >> Thanks. >> > Here is what I would do. Pull the Standard Analyzer out of Lucene. > Modify StandardAnalyzer.jj. This is a JavaCC file. In it, there is some > regex that defines tokens for parsing. Now try some steps similar to > this: add '_' and '-' to the definition of a letter. Add a new token > type that eats quoted phrases...look at queryparser.jj for an example, > prob about half way down the file <QUOTED>. Now run JavaCC on the > StandardAnalyzer.jj. Search the mailing list when you find out that a > ParseException is screwing up compilation (I really wish someone would > update that for the latest JavaCC if indeed that is the problem. Its > really annoying, and excluding it from compilation doesn't seem to fix > it anymore). > > - Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6098930 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]