Re: [Declude.JunkMail] regEx question

Matt Fri, 22 Oct 2004 17:01:22 -0700

Bill,

It is limited as far as regEx goes with programming languages, and any sort of chaining is done one step at a time and requires you to code loops and do string manipulation to get what you are after. There's a big difference with how command line switches and chaining works with the Unix tools and how this works in a scripting/programming environment. It might take a bit more work, but it is definitely more flexible since regEx can't do everything. The reference for VBScript can be found here:

http://msdn.microsoft.com/library/default.asp?url="">

I'm trying to write a message parser that does things like remove quoted-printable encoding, HTML encoding, URL encoding, and separates the elements into just the HTML and just the text. I may find that this is too much for my skill level, but what I'm trying to do doesn't need to be perfect either.

Matt

Bill Landry wrote:

----- Original Message ----- 
From: "Matt" <[EMAIL PROTECTED]>

Unfortunately that isn't an option in VBScript.  What I was really
trying to do is return a string with just the HTML and not what is
before, after or in between it.  When you execute a regEx _expression_ in
VBScript, it returns the matches in an object similar to an array, and
by adding a loop to take each value and add that to a string does work,
but there's probably a better way.  Doing the inverse as was shown in
that script that you linked to is easy due to the replace method, but it
seems strange that there isn't a more simple way to return just the
matches.  I'm still weak on the syntax and having issues with doing
and/or/not stuff, but I'm sure that I'll pick it up in time, and maybe
some help.


Hmmm, does VBScript support sed type command syntax?  Is this the kind of
output you're looking for?:

<alt="">
<td>
<td>
<tr>
<tr>
<7">
<alt="">
<td>
<tr>
<tr>
<333333">
<alt="">
<td>
<tr>
<tr>
<7">
<alt="">
<td>
<tr>
<tr>
<333333">
<alt="">
<td>
<tr>
<table>
<footer -->
<td>
<tr>
<table>
<body -->
<td>
<tr>
<table>
<td>
<tr>
<table>
<body>
<html>

Which is partical output from an html e-mail that I got from the following
script:

sed "s/\</\\n\</g" html-mail.txt | egrep "<[^>]*>"

You would need to add a few clean-up commands, but that's roughly it.

Bill

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

-- 
=====================================================
MailPure custom filters for Declude JunkMail Pro.
http://www.mailpure.com/software/
=====================================================

Re: [Declude.JunkMail] regEx question

Reply via email to