Here is the ElisionFilter of Lucene: https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html
This one only works with apostrophe elisions (' and U+2019), so maybe does not apply for Tibetan. But it should inspire you. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Monday, April 28, 2014 10:36 PM > To: java-user@lucene.apache.org > Cc: 'Chris Tomlinson' > Subject: RE: What is the proper use of stop words in Lucene? > > Hi, > > > > What you intend to do is not a "stopword" use case. You want to "ignore" > > some words - Lucene has no support for this, because in native > > language processing this makes no sense. > > > > Thank you for the information. I was unaware that ignoring some words > > "makes no sense". I thought I gave a reasonable example of exactly > > this situation in the native processing of Tibetan. Perhaps I am still > > not understanding. > > Elisions are a bit different than stopwords (although I don't know about them > in Tibet language). The Tokenizer should *not* split Elisions from the terms > (initially the term is the full word including the elision). In most languages > those are separated by (for example) an apostrophe (e.g. French: le + arbre > → l’arbre). The Tokenizer would keep those parts together (l’arbre). A later > TokenFilter would then edit the token and remove the elision (if needed): > arbre. This is how the French Analyzer in Lucene works. > > Lucene currently does not have Tibetanian Analyzer, so you have to make > your own one (I think this is what you tried to do). You should carefully > choose the Tokenizer and add something like an TibetanElisionFilter that > removes the not wanted parts from the tokens. > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org