Hi,

I'm trying to migrate some Analyzers from API 3.6 to 6.2 and I'm not sure if I got the right approach. I'm using Pylucene, so lets assume this is pseudo-code.

In 3.x (and up to 4), I've had access to the StringReader containing the data in the overriden tokenStream(fieldName, reader):

class TokenStream3(PythonTokenStream):
    def __init__(self, reader):
        self.data = DATA_FROM_READER(reader)
        self.i = 0
        # prepare termAtt/offsetAtt/posIncrAtt and other helpers

    def incrementToken(self):
        if self.i == len(self.data):
            return False
        # stuff from self.data into termAtt/offsetAtt/posIncrAtt
        self.i += 1
        return True

class Analyzer3(PythonAnalyzer):
    def tokenStream(self, fieldName, reader):
        return TokenStream3(reader)
-----

In 5.x/6.x I've only found the following approach with some ugly indirections: Capture the active reader in Analyzer.initReader() and access it via callback in the Tokenizer.

class Tokenizer6(PythonTokenizer):
    def __init__(self, getReader):
        # callable for retrieving current reader
        self.getReader = getReader
        self.i = 0
        self.data = None

    def incrementToken(self):
        if self.i == 0:
            self.data = DATA_FROM_READER(self.getReader())
        if self.i == len(self.data):
            # we are reused - reset
            self.i = 0
            return False
        # stuff from self.data into termAtt/offsetAtt/posIncrAtt
        self.i += 1
        return True

class Analyzer6(PythonAnalyzer):
    def createComponents(self, fieldName):
return Analyzer.TokenStreamComponents(Tokenizer6(lambda: self._reader))

    def initReader(self, fieldName, reader):
        # capture reader
        self._reader = reader
        return reader
-----

Is this sane?

--dirk

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to