Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm
not seeing StandardAnalyzer.jj in the Lucene source download.
Mark Miller-5 wrote:
>
> Philip Brown wrote:
>> Hi,
>>
>> After running some tests using the StandardAnalyzer, and getting 0
>> results
>> from the search, I believe I need a special Tokenizer/Analyzer. Does
>> anybody have something that parses like the following:
>>
>> - doesn't parse apart phrases (in quotes)
>> - doesn't parse/separate hyphentated or underscored words
>> other normal stuff like
>> - parses on whitespace
>> - removes periods in acronyms
>> - lowercases everything (even in quotes? -- maybe)
>>
>> I basically have a set of terms, some of which are multi-worded phrases,
>> but
>> none should ever be broken apart -- not when adding the documents, not
>> when
>> querying the search results, etc. I'm creating the field in the
>> documents
>> as UN_TOKENIZED and using a StandardAnalyzer and basic Query object to
>> get
>> the results. Any suggestions and/or existing code that I could re-use to
>> fit this purpose?
>>
>> Thanks.
>>
> Here is what I would do. Pull the Standard Analyzer out of Lucene.
> Modify StandardAnalyzer.jj. This is a JavaCC file. In it, there is some
> regex that defines tokens for parsing. Now try some steps similar to
> this: add '_' and '-' to the definition of a letter. Add a new token
> type that eats quoted phrases...look at queryparser.jj for an example,
> prob about half way down the file <QUOTED>. Now run JavaCC on the
> StandardAnalyzer.jj. Search the mailing list when you find out that a
> ParseException is screwing up compilation (I really wish someone would
> update that for the latest JavaCC if indeed that is the problem. Its
> really annoying, and excluding it from compilation doesn't seem to fix
> it anymore).
>
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6098930
Sent from the Lucene - Java Users forum at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]