Re: regexp help

chris porter Tue, 31 Aug 2004 09:35:28 -0700

:) well that regex didn't produce a match unfortunately.. however i was just looking at this & thinking man this is screwey.. maybe the \r isnt required for some odd reason. so i revised the regex to read:
([[:alnum:]]+)\r?\s*([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)

and got a match! but it was on: "item1 0344437 1 03/12/2004 USD 335.75"
so the next important thing is to add a space to the :alnum: list and bam! i got the full match. so with the regex revised to:
([[:alnum:] ]+)\r?\s*([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)

i get the match i needed wich is: "description of some item1 0344437 1 03/12/2004 USD 335.75"

its still not quite perfect, but hey, its matching & thats the important part!!

Thanks for your help!!
-chris

> heh after talk'n out of my but this should work for you
>
> ([[:alnum:]
]+)\r> \s*([0-9]{7})\s*([0-9]+)\s*([0-3][0-9]/[0-1][0-9]/[0-9]{4})\s*([A-Z]{2,
> 3})\s*([0-9]+\.[0-9]{2})
>
> You have to add some of your own business logic to not parse the
> "this
> is just a confirmation email" emails. I do not know all the codes
> that
> you are using with the USD codes so I left it like this: [A-Z]{2,3}
>
> I just hope that I have helped somewhere.
>
> Ian

>
>
> ----- Original Message -----
> From: chris porter <[EMAIL PROTECTED]>
> Date: Mon, 30 Aug 2004 21:40:53 -0400
> Subject: Re: regexp help
> To: CF-Talk <[EMAIL PROTECTED]>
>
> that was an option i explored, however these emails arent coming from
> just one source, in fact there are hundreds of different company
> emails coming in, so my options were, either 1) define the data i
> need, and specify a regex to grab it, then database the expressions,
> or 2) write a custom script in code that can identify each one of
> those possibly by regex. personally i opted for option a cause i can
> always fine tune an _expression_, but changing code gets tedious.

>

> on another note, what distinguishes the item description line during
> a line by line scan from something like

>

> "this is just a confirmation email."

>

> get my drift?

> -chris
>
>

>

> >I understand that, what I meant was, wouldn't it be easier to parse

> >each line separately? This way you do not need to have such a highly

> >complicated RegEx. You can make it much more simple and a little bit

> >more flexible. Not to mention easier to maintain. (i.e. two separate

> >regular expressions or more)

> >

> ><cfscript>

> >   email = emailFromServer; // focused down to the body content only

> >   n = listlen(email,chr(13) & chr(32));

> >   output = arraynew(1);

> >   output[1] = structnew();

> >   a = arraynew(2);

> >   a[1][1] = "RegEx";

> >   a[1][2] = "description";

> >   a[2][1] = "RegEx";

> >   a[2][2] = "productnumber";

> >   nn = arraylen(a); // regex count

> >   nnn = 1; // record count

> >   c = 0; // field count

> >   for (i=1:i LTE n;i=i+1) {

> >      currentitem = listgetat(email,i,chr(13) & chr(32));

> >      for (ii=1;ii LTE nn;ii=ii+1) {

> >         if (refind(a[i][1],currentitem)) {

> >            output[nnn]['#a[i][2]#'] = trim(currentitem);

> >            if (c GT 5) {

> >               c = 0;

> >               nnn = nnn + 1;

> >               output[nnn] = structnew();

> >            } else {

> >               c = c +1;

> >            }

> >         }

> >      }

> >   }

> ></cfscript>

> >

> >heh I went a little crazy here but I pretty sure this would work.
> Any

> >future changes on what each feild would hold would be easy to change.
>

> >(beware I did not test it)

> >

> >Ian

> >

> >

> >----- Original Message -----

> >From: Michael Dinowitz <[EMAIL PROTECTED]>

> >Date: Mon, 30 Aug 2004 18:23:04 -0400

> >Subject: RE: regexp help

> >To: CF-Talk <[EMAIL PROTECTED]>

> >

> >You can get the first and second lines based on a new line delimited
> list

> > and you can get the items in the second line using a space delimited
> list. I

> > just like to be specific when parsing data from a source like email.
> My

> > preference is to pop 2 messages, save the first in a DB (raw) and
> use the

> > second as a flag to rerun the page. Once all the mail is down and
> stored in

> > the DB (one at a time, there's a reason), I have a second process
> parse each

> > message in the tightest way possible. I'm paranoid (as all
> programmers

> > should be) about data from outside sources and I want to be 100%
> sure of

> > what I'm getting and how. If there's a problem, then I want to know
> exactly

> > what's up.

> >

> >   _____

> >

> >

> >

> > Why don;t you just go through the text as a list with CR as the

> > delimiter? This way you can have much more focused regular

> > expressions.

> >

> > Just a thought,

> >

> > Ian

> >

> > ----- Original Message -----

> > From: Michael Dinowitz <[EMAIL PROTECTED]>

> > Date: Mon, 30 Aug 2004 16:49:48 -0400

> > Subject: RE: regexp help

> > To: CF-Talk <[EMAIL PROTECTED]>

> >

> > Really fast (Using the multi-line move of CFMX)

> >

> >
^([^#ch> r(13)#]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]{2

> > }/[0-9]{2}/[0-9]{4})[[:space:]]+(USD)?[[:space:]]*([0-9.]+)$

> >

> >    _____

> >

> > From: chris porter [mailto:[EMAIL PROTECTED]

> > Sent: Monday, August 30, 2004 4:41 PM

> > To: CF-Talk

> > Subject: Re: regexp help

> >

> > and one last time....

> >

> > DATA:

> >

> > Product Name

> > Product Number            Qty      Est. Ship Date
> Your Ext.

> > Price

> > [dashed go here all the way across PITA email parser]

> >

> > description of some item1

> > 0344437                     1           03/12/2004
> USD

> > 335.75

> >

> > another description of some item1

> > 0344734                     1           03/12/2004
> USD

> > 335.75

> >

> > and one last description of some item

> > 0433447                     1           03/12/2004
> USD

> > 335.75

> >

> > part i need parsed by a regex

> >

> > "description of some item1

> > 0344437                     1           03/12/2004
> USD

> > 335.75"

> >

> > current REGEX:

> > ([0-9]+)[ ]+([0-9]+)[ ]+([^ ]+)[ ]+([A-Z]{3})[ ]+([0-9.]+)

> >

> > that regex matches everything on the 2nd line correctly, but nothing
> i add

> > to the beginning will match the first line. any thoughts?

> >

> > Thanks!

> >
-> Chris________________________________________________________________

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Re: regexp help

Reply via email to