On September 25, 1998 at 18:21, "Christian de la Salle" wrote:

> Let me put it in other word: sometimes MHonArc misses the subject-based
> threading and I can't find the reason why. A couple of examples follow but I
> must confess my analysis does not bring the solution: For some failing thread
> configurations I have other situation where they work.
> 
> Three failing examples :
> I- Herebelow, MHonArc missed that 3 is a reply to 1
>     1- Cha�ne d'Info (Active Channel), Christian de la Salle - 24/07/98
>     2- Mise en place d'archives, Christian de la Salle - 07/08/98
>     3- Re : Cha�ne d'Info (Active Channel), Christian de la Salle - 23/08/
-----------^
SUBJECTREPLYRXP fails to match.


> II- Herebelow, MHonArc missed that 4 is a reply to 2
>     1- Mailing lists, Christian de la Salle - 23/06/98
>     2- Securit� / Stats, Christian de la Salle - 02/07/98
>     3- Majordomo : archives, Christian de la Salle - 23/07/98
>     4- Re: Securit� / Stats, Christian de la Salle - 23/07/98
>     5- Redirect Problem and Majordomo Archiving Question, Christian de la
> Salle - 28/07/98
> Notes / Same author too. They feature special characters (accents) and when I
> look at the mbx file they are both encoded the same way
>     Subject: =?iso-8859-1?Q?Tr:_Securit=E9_/_Stats?=
>     Subject: =?iso-8859-1?Q?Securit=E9_/_Stats?=

MHonArc does not decode subject text when checking for subject-based
threads (less overhead).  So, since the "base" subject text does not
match after SUBJECTREPLYRXP is applied, no subject-thread is detected.
If the "Tr:" was not part of the encoded text (ie. It came before
it), then a match would have been made.

> III- Herebelow, MHonArc missed that 3 is a reply to 1
>     1- Bonjour � tous!, Pierre / JP Derrier - 02/09/98
>     2- un humble avis sur les questions-r�ponses, Anne GUILLIEN - 03/09/98
>     3- Re: Bonjour � tous!, Christian de la Salle - 03/09/98
> Notes: Different authors. They feature special characters (accents) and when 
> I
> look at the mbx file they are encoded a different way
>     Subject: =?iso-8859-1?Q?Bonjour_=E0_tous!?=
>     Subject: Re: Bonjour � tous!

Same issue as previous example.


> Onthe other hand the following examples work OK:
> 
> I- Herebelow, MHonArc found that 3 is a reply to 2
>     1- afa, Support Technique - 02/03/98
>     2- Acc�s Magic On Line, Christian de la Salle - 13/03/98
>     3- Re: Acc�s Magic On Line, Arnaud Pignard - 13/03/98
> Notes: Different authors. They feature special characters
> (accents) and when I look at the mbx file they are not encoded :
>     Subject: Acc�s Magic On Line
>     Subject: Re: Acc�s Magic On Line

Here, the "base" subjects match after SUBJECTREPLYRXP is applied.  I.e.
There is no encoding variations to mess things up.


> II- Herebelow, MHonArc found that 4 and 5 are replies to 1
>     1- Actions pr�vues, Christian de la Salle - 07/09/98
>     2- HELP ! (www.afa.asso.fr), Christian de la Salle - 09/09/98
>     3- AFA - Thx, Christian de la Salle - 09/09/98
>     4- Tr: Actions pr�vues, Christian de la Salle - 17/09/98
>     5- Re: Tr: Actions pr�vues, Christian de la Salle - 17/09/98
> Notes: Same author. They feature special characters
> (accents) and when I look at the mbx file they are not encoded :
>     Subject: Actions pr�vues
>     Subject: Tr: Actions pr�vues
>     Subject: Re: Tr: Actions pr�vues

Same as previous example.

In sum, subject-based detection will fail if the "base" subject text
does not match after SUBJECTREPLYRXP is applied.  Alternate non-ascii
encoding of the same subject can cause a fail to match.

Some implementation issues arise if decoding is done first, plus the
extra overhead will slow things down.  Subject-based detection already
has its built-in deficiencies with respect to threading.  So for now, I
see no compelling reason to change anything.

It's still an interesting problem.

        --ewh

----
             Earl Hood              | University of California: Irvine
      [EMAIL PROTECTED]      |      Electronic Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME

Reply via email to