Unfortunately, no...I am not combine every two term into one. I am combining a specific pair.
E.g. the Token Stream: t1 t2 t2a t3 should be rewritten into t1 t2t2a t3 But the TS: t1 t2 t3 t2a should not be rewritten, and it is already correct On Fri, Dec 21, 2012 at 5:00 PM, Alan Woodward < alan.woodw...@romseysoftware.co.uk> wrote: > Have a look at ShingleFilter: > http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html > > On 21 Dec 2012, at 08:42, Xi Shen wrote: > > > I have to use the white space and word delimiter to process the input > > first. I tried many combination, and it seems to me that it is inevitable > > the term will be split into two :( > > > > I think developing my own filter is the only resolution...but I just > cannot > > find a guide to help me understand what I need to do to implement a > > TokenFilter. > > > > > > On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torin...@gmail.com> wrote: > > > >> Easiest way would be to pre-process your input and join those 2 tokens > >> before splitting them by white space. > >> > >> But from given context I might miss some details...still worth a shot. > >> > >> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshe...@gmail.com> wrote: > >> > >>> Hi, > >>> > >>> I am looking for a token filter that can combine 2 terms into 1? E.g. > >>> > >>> the input has been tokenized by white space: > >>> > >>> t1 t2 t2a t3 > >>> > >>> I want a filter that output: > >>> > >>> t1 t2t2a t3 > >>> > >>> I know it is a very special case, and I am thinking about develop a > >> filter > >>> of my own. But I cannot figure out which API I should use to look for > >> terms > >>> in a Token Stream. > >>> > >>> -- > >>> Regards, > >>> David Shen > >>> > >>> http://about.me/davidshen > >>> https://twitter.com/#!/davidshen84 > >>> > >> > > > > > > > > -- > > Regards, > > David Shen > > > > http://about.me/davidshen > > https://twitter.com/#!/davidshen84 > > -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84