On Mon, Jul 25, 2011 at 9:54 AM, Dan Kennedy <danielk1...@gmail.com> wrote: > On 07/24/2011 08:16 PM, Abhinav Upadhyay wrote: >> Hi, >> >> I am trying to write my own custom tokenizer to filter stopwords apart >> from doing normalization and stemming. I have gone through the >> comments in fts3_tokenizer.h and also read the implementation of the >> simple tokenizer. While overall I am able to understand what I need to >> do to implement this tokenizer, but I still cannot visualize how the >> FTS engine calls the tokenizer and what data in what form it passes to >> it. >> >> Does the FTS engine pass the complete document data to the tokenizer >> or it passes some chunks of data, or individual words ? I need to >> understand this part because the next function needs to set the >> offsets accordingly. By just going through the code of the simple >> tokenizer I could not completely comprehend it (it would have been >> better if I could debug it). >> >> By the next functio I mean this: int (*xNext)( >> sqlite3_tokenizer_cursor *pCursor, /* Tokenizer cursor */ >> const char **ppToken, int *pnBytes, /* OUT: Normalized text for token >> */ >> int *piStartOffset, /* OUT: Byte offset of token in input buffer */ >> int *piEndOffset, /* OUT: Byte offset of end of token in input >> buffer */ >> int *piPosition /* OUT: Number of tokens returned before this one >> */ >> ); >> }; >> >> It would be better if you could explain what is the role of these >> parameters: piEndOffset , piStartOffset ? > > Each time xNext() returns SQLITE_OK to return a new token, xNext() > should set: > > *piStartOffset to the number of bytes in the input buffer before > start of the token being returned, > > *piEndOffset to *piStartOffset plus the number of bytes in the > token text, and > > *piPosition to the number of tokens that occur in the input buffer > before the token being returned.
Thanks for the explanation. I was able to correct my implementation :-) . _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users