I have to use the white space and word delimiter to process the input first. I tried many combination, and it seems to me that it is inevitable the term will be split into two :(
I think developing my own filter is the only resolution...but I just cannot find a guide to help me understand what I need to do to implement a TokenFilter. On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torin...@gmail.com> wrote: > Easiest way would be to pre-process your input and join those 2 tokens > before splitting them by white space. > > But from given context I might miss some details...still worth a shot. > > On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshe...@gmail.com> wrote: > > > Hi, > > > > I am looking for a token filter that can combine 2 terms into 1? E.g. > > > > the input has been tokenized by white space: > > > > t1 t2 t2a t3 > > > > I want a filter that output: > > > > t1 t2t2a t3 > > > > I know it is a very special case, and I am thinking about develop a > filter > > of my own. But I cannot figure out which API I should use to look for > terms > > in a Token Stream. > > > > -- > > Regards, > > David Shen > > > > http://about.me/davidshen > > https://twitter.com/#!/davidshen84 > > > -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84