Hi, You can write your own token-filter to split on some characters (comma, | etc.,) and then build an analyzer using the WhiteSpaceTokenizer, LowerCaseFilter and your CustomTokenFilter.
See http://stackoverflow.com/questions/9015348/lucene-custom-analyzer/9015658#9015658 On Mon, Feb 25, 2013 at 11:24 AM, kumar <x10...@gmail.com> wrote: > Hello all > > I am a lucene novice and trying to setup lucene in a .net app using > lucene.net for searching through documents > So far it has been fantastic, however given that the users expectations > are for "google"-like search, > running into issues searching for .net and c# > > Initially tried the StandardAnalyzer which of course does not work for > searching - .net & c# > Changed that to a custom analyzer using WhitespaceTokenizer and > LowerCaseFilter and it works > however some of the documents have the keywords as > > oracle,.net,C#,java etc. ( i.e. separated by commas without any space ) > > and this custom analyzer fails here > > Looking for suggestions on how this might work as i'm sure it's possible > considering both lucene and .net/c# have been around for a long long while > > It looks like PatternAnalyzer might be of some use in this case, however > i'm not quite sure how to use it and have found scant references to it > > > Any help is appreciated > > Thanks > kumar > > -- Regards Naresh