Hi Michal, Please repost on the lucene-user list. general@l.a.o has fewer subscribers, and it’s not focussed on Lucene usage questions.
More info: <http://lucene.apache.org/core/discussion.html#java-user-list-java-userlucene> -- Steve www.lucidworks.com > On May 31, 2016, at 9:58 AM, Michal Krajňanský <michal.krajnan...@gmail.com> > wrote: > > Dear Lucene users, > > I have implemented a custom tokenizer (derived from TokenStream). > > I need to pass additional attributes to those standard in Lucene > (PositionIncrementAttribute, OffsetAttribute), that would represent the > word position in the tokenized sentence in the number of words and not > characters, as one usually passes through OffsetAttribute. (I need both.) > > Is there a way of achieving this? > > I tried to implement own Attribute class (derive a new interface and > implementing class). The code compiles ok but I am getting exception at > runtime about the class casting. > > Thank you a lot in advance, > > > MK > > > > FYI the code looks like this: > > /** > * > */ > package com.newstin.nlp.analysis; > > import java.io.IOException; > import java.util.Iterator; > import java.util.List; > > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; > import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; > import > org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; > > /** > * @author michal > */ > public class TermsListTokenizer extends TokenStream > { > private final CharTermAttribute termAtt = > addAttribute(CharTermAttribute.class); > private final OffsetAttribute offsetAtt = > addAttribute(OffsetAttribute.class); > private final PositionIncrementAttribute posIncrAtt = > addAttribute(PositionIncrementAttribute.class); > > private final Iterator<Term> termIterator; > private int lastTermPos; > > public TermsListTokenizer(List<Term> terms) > { > termIterator = terms.iterator(); > lastTermPos = -1; > } > > @Override > public boolean incrementToken() throws IOException > { > clearAttributes(); > > // TODO: check: compute the positions right for term variants !!! > if (termIterator.hasNext()) { > Term term = termIterator.next(); > > termAtt.append(term.getTerm()); > offsetAtt.setOffset(term.getStart(), term.getEnd()); // need to > also save position in the number of words > posIncrAtt.setPositionIncrement(term.getWordIndex() - > lastTermPos); > lastTermPos = term.getWordIndex(); > return true; > } > > return false; > } > }