At 17:47 12.01.00 -0600, you wrote:
>At 6:49 PM +0100 1/12/00, Marc Pohl wrote:
>>i reviewed the sourcecode for htdig-3.2.0b1-dev-010900 this weekend
>>and discovered that there could be similar errors in
>>htword/WordType.cc because of signed char to int casts. The exactly
>>same error cannot happen because the iscntrl() is in the else branch
>>of IsStrictChar() in 3.2.
>
>Could you also post your original patch to 3.1.4 with diff -c as
>well? I'd like to have it on the [EMAIL PROTECTED] lists because I
>think it will help some of these recent questions about indexing and
>searching foreign characters.
>
>>My proposed patch is the following snippet, introducing two new
>>member functions to WordType, instead of calling isdigit() and
>>iscntrl() directly.
>
>This looks fine to me. Since it's a bug-fix, unless I hear screams of
>protest, it's going in sometime tomorrow.
>
>-Geoff
>
Hello Geoff,
Yesterday i found a small potential problem in the patched code:
At the beginning of the initialisation of WordType is the line
chrtypes[0] = 0;
Because we never call iscntrl(0) this line must be
chrtypes[0] = WORD_TYPE_CONTROL;
During my tests this make no difference, but i think that i don't have any unwanted #0
in my html-docs.
Marc
And here is my patch against the version 3.1.4:
*** WordList.cc.orig Fri Dec 10 01:28:44 1999
--- WordList.cc Thu Jan 13 20:23:29 2000
***************
*** 108,125 ****
while (word && *word)
{
! if (HtIsStrictWordChar((unsigned char)*word) && !isdigit(*word))
{
alpha = 1;
// break; /* Can't stop here, there may still be control chars! */
}
! else if (allow_numbers && isdigit(*word))
{
alpha = 1;
// break; /* Can't stop here, there may still be control chars! */
}
// if (*word >= 0 && *word < ' ')
! else if (iscntrl(*word))
{
control = 1;
break;
--- 108,125 ----
while (word && *word)
{
! if (HtIsStrictWordChar((unsigned char)*word) && !isdigit((unsigned char)*word))
{
alpha = 1;
// break; /* Can't stop here, there may still be control chars! */
}
! else if (allow_numbers && isdigit((unsigned char)*word))
{
alpha = 1;
// break; /* Can't stop here, there may still be control chars! */
}
// if (*word >= 0 && *word < ' ')
! else if (iscntrl((unsigned char)*word))
{
control = 1;
break;
I hope that my email program will not mangle that ;-)
-----------------------------------------------------
Marc Pohl
Westdeutscher Rundfunk
Tel.: +49 221 220 8618 OSC/Videotextredaktion
FAX: +49 221 220 3882 D-50600 Koeln
Email: [EMAIL PROTECTED]
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.