My suggestion would be to spin up a debugger and set a breakpoint on
that loop and check to see if it's really doing what you think it is.
If it is, break in the code doing the actual tokenization and see if
it's being called as expected.  Or scatter some printf() calls in
there.  It's embarrassing the number of times one writes really simple
code which doesn't work, and it turns out that it's something silly,
like you didn't actually compile what you thought you did, or you
modified code in the wrong file, or something like that.

I don't think you probably want isspace() here, though, it would let
through non-whitespace control characters.  Most of your inputs
probably won't have those, but it's easy to plan ahead, you might
consider isgraph() instead, used the same way as isalnum() was being
used before.

-scott


On Mon, Apr 6, 2009 at 8:51 AM, Andy Roberts <the_...@hotmail.com> wrote:
>
> Hi,
>
> I downloaded the amalgamation sources in order to create a build of sqlite
> with FTS3 enabled. The problem for me is that the default "simple" tokenizer
> is not behaving precisely how I want. In fact, I'd prefer if it wouldn't
> count punctuation as a delimeter, and stuck purely to whitespace.
>
> In the simpleCreate() function there's some code that initializes an array
> that records with characters are delimiters or not:
>
> for(i=1; i<0x80; i++){
>    t->delim[i] = !isalnum(i);
> }
>
> I thought that if I made a simple edit to use the isspace() function then
> I'd achieve what I was after, i.e.,
>
> for(i=1; i<0x80; i++){
>    t->delim[i] = isspace(i);
> }
>
> However, when I build this version, create my fts virtual tables and then
> query them I get zero results. When I revert back to !isalnum I get results,
> but as I'm seeing words that are being split where I don't want them to be.
>
> I must admit my C experience isn't great, but I've been trying for far too
> many hours now with little gain. I'd really appreciate some pointers!
>
> Thanks in advance,
> Andy
> --
> View this message in context: 
> http://www.nabble.com/Simple-Tokenizer-in-FTS3-tp22911635p22911635.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to