On September 25, 1998 at 18:21, "Christian de la Salle" wrote:
> Let me put it in other word: sometimes MHonArc misses the subject-based
> threading and I can't find the reason why. A couple of examples follow but I
> must confess my analysis does not bring the solution: For some failing thread
> configurations I have other situation where they work.
>
> Three failing examples :
> I- Herebelow, MHonArc missed that 3 is a reply to 1
> 1- Cha�ne d'Info (Active Channel), Christian de la Salle - 24/07/98
> 2- Mise en place d'archives, Christian de la Salle - 07/08/98
> 3- Re : Cha�ne d'Info (Active Channel), Christian de la Salle - 23/08/
-----------^
SUBJECTREPLYRXP fails to match.
> II- Herebelow, MHonArc missed that 4 is a reply to 2
> 1- Mailing lists, Christian de la Salle - 23/06/98
> 2- Securit� / Stats, Christian de la Salle - 02/07/98
> 3- Majordomo : archives, Christian de la Salle - 23/07/98
> 4- Re: Securit� / Stats, Christian de la Salle - 23/07/98
> 5- Redirect Problem and Majordomo Archiving Question, Christian de la
> Salle - 28/07/98
> Notes / Same author too. They feature special characters (accents) and when I
> look at the mbx file they are both encoded the same way
> Subject: =?iso-8859-1?Q?Tr:_Securit=E9_/_Stats?=
> Subject: =?iso-8859-1?Q?Securit=E9_/_Stats?=
MHonArc does not decode subject text when checking for subject-based
threads (less overhead). So, since the "base" subject text does not
match after SUBJECTREPLYRXP is applied, no subject-thread is detected.
If the "Tr:" was not part of the encoded text (ie. It came before
it), then a match would have been made.
> III- Herebelow, MHonArc missed that 3 is a reply to 1
> 1- Bonjour � tous!, Pierre / JP Derrier - 02/09/98
> 2- un humble avis sur les questions-r�ponses, Anne GUILLIEN - 03/09/98
> 3- Re: Bonjour � tous!, Christian de la Salle - 03/09/98
> Notes: Different authors. They feature special characters (accents) and when
> I
> look at the mbx file they are encoded a different way
> Subject: =?iso-8859-1?Q?Bonjour_=E0_tous!?=
> Subject: Re: Bonjour � tous!
Same issue as previous example.
> Onthe other hand the following examples work OK:
>
> I- Herebelow, MHonArc found that 3 is a reply to 2
> 1- afa, Support Technique - 02/03/98
> 2- Acc�s Magic On Line, Christian de la Salle - 13/03/98
> 3- Re: Acc�s Magic On Line, Arnaud Pignard - 13/03/98
> Notes: Different authors. They feature special characters
> (accents) and when I look at the mbx file they are not encoded :
> Subject: Acc�s Magic On Line
> Subject: Re: Acc�s Magic On Line
Here, the "base" subjects match after SUBJECTREPLYRXP is applied. I.e.
There is no encoding variations to mess things up.
> II- Herebelow, MHonArc found that 4 and 5 are replies to 1
> 1- Actions pr�vues, Christian de la Salle - 07/09/98
> 2- HELP ! (www.afa.asso.fr), Christian de la Salle - 09/09/98
> 3- AFA - Thx, Christian de la Salle - 09/09/98
> 4- Tr: Actions pr�vues, Christian de la Salle - 17/09/98
> 5- Re: Tr: Actions pr�vues, Christian de la Salle - 17/09/98
> Notes: Same author. They feature special characters
> (accents) and when I look at the mbx file they are not encoded :
> Subject: Actions pr�vues
> Subject: Tr: Actions pr�vues
> Subject: Re: Tr: Actions pr�vues
Same as previous example.
In sum, subject-based detection will fail if the "base" subject text
does not match after SUBJECTREPLYRXP is applied. Alternate non-ascii
encoding of the same subject can cause a fail to match.
Some implementation issues arise if decoding is done first, plus the
extra overhead will slow things down. Subject-based detection already
has its built-in deficiencies with respect to threading. So for now, I
see no compelling reason to change anything.
It's still an interesting problem.
--ewh
----
Earl Hood | University of California: Irvine
[EMAIL PROTECTED] | Electronic Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME