If it's a fixed list and not excessively long, would synonyms work? But if theres some kind of logic you need to apply, I don't think you're going to find anything OOB. The problem is that by the time a token filter gets called, they are already split up, you'll probably have to write a custom filter that manages that logic.
Best Erick On Fri, Dec 21, 2012 at 4:16 AM, Xi Shen <davidshe...@gmail.com> wrote: > Unfortunately, no...I am not combine every two term into one. I am > combining a specific pair. > > E.g. the Token Stream: t1 t2 t2a t3 > should be rewritten into t1 t2t2a t3 > > But the TS: t1 t2 t3 t2a > should not be rewritten, and it is already correct > > > On Fri, Dec 21, 2012 at 5:00 PM, Alan Woodward < > alan.woodw...@romseysoftware.co.uk> wrote: > > > Have a look at ShingleFilter: > > > http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html > > > > On 21 Dec 2012, at 08:42, Xi Shen wrote: > > > > > I have to use the white space and word delimiter to process the input > > > first. I tried many combination, and it seems to me that it is > inevitable > > > the term will be split into two :( > > > > > > I think developing my own filter is the only resolution...but I just > > cannot > > > find a guide to help me understand what I need to do to implement a > > > TokenFilter. > > > > > > > > > On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torin...@gmail.com> > wrote: > > > > > >> Easiest way would be to pre-process your input and join those 2 tokens > > >> before splitting them by white space. > > >> > > >> But from given context I might miss some details...still worth a shot. > > >> > > >> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshe...@gmail.com> > wrote: > > >> > > >>> Hi, > > >>> > > >>> I am looking for a token filter that can combine 2 terms into 1? E.g. > > >>> > > >>> the input has been tokenized by white space: > > >>> > > >>> t1 t2 t2a t3 > > >>> > > >>> I want a filter that output: > > >>> > > >>> t1 t2t2a t3 > > >>> > > >>> I know it is a very special case, and I am thinking about develop a > > >> filter > > >>> of my own. But I cannot figure out which API I should use to look for > > >> terms > > >>> in a Token Stream. > > >>> > > >>> -- > > >>> Regards, > > >>> David Shen > > >>> > > >>> http://about.me/davidshen > > >>> https://twitter.com/#!/davidshen84 > > >>> > > >> > > > > > > > > > > > > -- > > > Regards, > > > David Shen > > > > > > http://about.me/davidshen > > > https://twitter.com/#!/davidshen84 > > > > > > > -- > Regards, > David Shen > > http://about.me/davidshen > https://twitter.com/#!/davidshen84 >