Have a look at ShingleFilter: http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html
On 21 Dec 2012, at 08:42, Xi Shen wrote: > I have to use the white space and word delimiter to process the input > first. I tried many combination, and it seems to me that it is inevitable > the term will be split into two :( > > I think developing my own filter is the only resolution...but I just cannot > find a guide to help me understand what I need to do to implement a > TokenFilter. > > > On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torin...@gmail.com> wrote: > >> Easiest way would be to pre-process your input and join those 2 tokens >> before splitting them by white space. >> >> But from given context I might miss some details...still worth a shot. >> >> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshe...@gmail.com> wrote: >> >>> Hi, >>> >>> I am looking for a token filter that can combine 2 terms into 1? E.g. >>> >>> the input has been tokenized by white space: >>> >>> t1 t2 t2a t3 >>> >>> I want a filter that output: >>> >>> t1 t2t2a t3 >>> >>> I know it is a very special case, and I am thinking about develop a >> filter >>> of my own. But I cannot figure out which API I should use to look for >> terms >>> in a Token Stream. >>> >>> -- >>> Regards, >>> David Shen >>> >>> http://about.me/davidshen >>> https://twitter.com/#!/davidshen84 >>> >> > > > > -- > Regards, > David Shen > > http://about.me/davidshen > https://twitter.com/#!/davidshen84