On Mon, Jul 25, 2011 at 9:54 AM, Dan Kennedy <danielk1...@gmail.com> wrote:
> On 07/24/2011 08:16 PM, Abhinav Upadhyay wrote:
>> Hi,
>>
>> I am trying to write my own custom tokenizer to filter stopwords apart
>> from doing normalization and stemming. I have gone through the
>> comments in fts3_tokenizer.h and also read the implementation of the
>> simple tokenizer. While overall I am able to understand what I need to
>> do to implement this tokenizer, but I still cannot visualize how the
>> FTS engine calls the tokenizer and what data in what form it passes to
>> it.
>>
>> Does the FTS engine pass the complete document data to the tokenizer
>> or it passes some chunks of data, or individual words ? I need to
>> understand this part because the next function needs to set the
>> offsets accordingly. By just going through the code of the simple
>> tokenizer I could not completely comprehend it (it would have been
>> better if I could debug it).
>>
>> By the next functio I mean this: int (*xNext)(
>>      sqlite3_tokenizer_cursor *pCursor,   /* Tokenizer cursor */
>>      const char **ppToken, int *pnBytes,  /* OUT: Normalized text for token 
>> */
>>      int *piStartOffset,  /* OUT: Byte offset of token in input buffer */
>>      int *piEndOffset,    /* OUT: Byte offset of end of token in input 
>> buffer */
>>      int *piPosition      /* OUT: Number of tokens returned before this one 
>> */
>>    );
>> };
>>
>> It would be better if you could explain what is the role of these
>> parameters: piEndOffset , piStartOffset ?
>
> Each time xNext() returns SQLITE_OK to return a new token, xNext()
> should set:
>
>   *piStartOffset to the number of bytes in the input buffer before
>   start of the token being returned,
>
>   *piEndOffset to *piStartOffset plus the number of bytes in the
>   token text, and
>
>   *piPosition to the number of tokens that occur in the input buffer
>   before the token being returned.

Thanks for the explanation. I was able to correct my implementation :-)
.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to