Hi Steve, This is a language dependent case. Basically, I will use white space token filter to process the input. But some of the inputs should be one term, instead of split into 2 terms. I think am thinking developing a special filter to fix these terms.
On Fri, Dec 21, 2012 at 3:34 PM, Steve Rowe <sar...@gmail.com> wrote: > Hi David, > > Not very many people read this mailing list - I suggest you switch to the > java-user list - see <http://lucene.apache.org/core/discussion.html>. > > SingleFilter and CommonGramsFilter combine terms, though the conditions > under which they do so don't appear to be the same as what you want. > > Why are only the second two terms combined? > > Steve > > On Dec 21, 2012, at 2:27 AM, Xi Shen <davidshe...@gmail.com> wrote: > > > Hi, > > > > I am looking for a token filter that can combine 2 terms into 1? E.g. > > > > the input has been tokenized by white space: > > > > t1 t2 t2a t3 > > > > I want a filter that output: > > > > t1 t2t2a t3 > > > > I know it is a very special case, and I am thinking about develop a > filter > > of my own. But I cannot figure out which API I should use to look for > terms > > in a Token Stream. > > > > > > -- > > Regards, > > David Shen > > > > http://about.me/davidshen > > https://twitter.com/#!/davidshen84 > > -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84