Can somebody please clarify the bug reporting process for sqlite? My 
understanding is that it's not possible to file bug reports directly, and that 
the advise is to write to the user list first. I've done that (below) but have 
no response so far and am concerned that this means the bug report will just be 
forgotten others, as well as by me.

How does this bug move from a message on a list to a ticket (and ultimately a 
patch, we hope) in the system?

James

On Feb 22, 2010, at 2:51 PM, James Berry wrote:

> I'm writing to report a bug in the porter-stemmer algorithm supplied as part 
> of the FTS3 implementation.
> 
> The stemmer has an inverted logic error that prevents it from properly 
> stemming words of the following form:
> 
>       dry -> dri
>       cry -> cri
> 
> This means, for instance, that the following words don't stem the same:
> 
>       dried -> dri   -doesn't match-   dry
>       cried -> cry   -doesn't match-   cry
> 
> The bug seems to have been introduced as a simple logic error by whoever 
> wrote the stemmer code. The original description of step 1c is here: 
> http://snowball.tartarus.org/algorithms/english/stemmer.html
> 
>       Step 1c:
>               replace suffix y or Y by i if preceded by a non-vowel which is 
> not the first letter of the word (so cry -> cri, by -> by, say -> say)
>       
> But the code in sqlite reads like this:
> 
>  /* Step 1c */
>  if( z[0]=='y' && hasVowel(z+1) ){
>    z[0] = 'i';
>  }
> 
> In other words, sqlite turns the y into an i only if it is preceded by a 
> vowel (say -> sai), while the algorithm intends this to be done if it is 
> _not_ preceded by a vowel.
> 
> But there are two other problems in that same line of code:
> 
>       (1) hasVowel checks whether a vowel exists anywhere in the string, not 
> just in the next character, which is incorrect, and goes against the step 1c 
> directions above. (amplify would not be properly stemmed to amplifi, for 
> instance)
> 
>       (2) The check for the first letter is not performed (for words like 
> "by", etc)
> 
> I've fixed both of those errors in the patch below:
> 
>   /* Step 1c */
> -  if( z[0]=='y' && hasVowel(z+1) ){
> + if( z[0]=='y' && isConsonant(z+1) && z[2] ){
>     z[0] = 'i';
>   }
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to