On Wed, 11 Aug 2010 14:29:35 -0400 Ed Szynaka <szyn...@localnet.com> wrote:
> Stevan Bajić wrote: > >> Is this list correct? Is there anything I missed? > >> > >> Content-Type: > >> text/plain > >> text/html (stripped of html) > >> message/* > >> unknown parts > >> > >> Content-Type-Encoding: > >> 7bit > >> 8bit > >> quoted-printable > >> base64 > >> > > Yes. You miss the point that any word longer then 50 characters will NOT be > > tokenized. Most data from attachments fall into that category and will not > > be tokenized. > > Does DSPAM consider the removal of an HTML tag as a word break? > Yes and no. Some HTML tags get just removed while others are replaced with a new line. And a new line is considered as a word break. > Thanks, > Ed > -- > Ed Szynaka > Network/Systems Manager > LocalNet Corp./CoreComm Internet Services > -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user