I didn't work out the entire regex out of laziness but adding -? at the beginning of the subexpression for the ISA control number will handle both cases. There is no assumption that the segment terminator is at any particular position.
You (plural) are correct that my regex assumes that the interchange is one continguous string. Converting 80 character blocks back into that form is not hard to do, nor is converting one-segment per line. Then we're back in business. Also, I am aware there are ways to deal with data spread out over multiple lines within a regex, but I've never done that myself so I can't say for sure it would be able to handle the block form. Chris's comments on the difficulty of debugging using regexes almost makes sense. They tend to look like line noise at first, until you learn them. However, the code necessary in their place, in any language, must look even more intimidating and would present greater challenges in debugging simply by being longer. In my opinion, the advantages justify taking the time to learn the skill. Howard 1 Peter 4:10 PS: It's good to be back! I'm with Werner Enterprises in Omaha now. ________________________________ From: Michael Mattias/LS <[email protected]> To: [email protected] Sent: Thu, July 22, 2010 6:51:59 AM Subject: Re: [EDI-L] <TECH>ISA recognitionH > This is where regular expressions really shine. > > /ISA(.).{2}\1.{10}\1.{2}/ > > which says "look for the upper case letters ISA followed by a single > character > (remember it), followed by 2 characters, your remembered character, 10 > characters, that pesky character again, followed by 2 more ... and on i One problem with anything which assumes a fixed length of 106 (or even the 'negative control number' size of 107) bytes is, it cannot recognize 'blocked' format; that is, 80 (or other value) characters, <newline>rest of segment..... OK, solved that.... now deal simulataneously with the problem when the 'segment terminator' *is* <newline>.... and don't forget <newline> can be *either* CRLF (PC) or LF only ('nix).... oh, and did I mention? There is still some data out there in which data are nominally one segment per record, but space-filled to some fixed record length following the segment's significant data. Oh yes, I almost forgot... all these different formats may exist within the same input, the only constant being the format will not change WITHIN an Interchange. Absent some constraints on "input format", there's really no shortcut to "find valid ISA (or any other) segment" except doing it more or less character by character and keeping track of what you've got so far, which is how I wrote my scanner back in '94 or '95 or so. Almost all the maintenance on it since then has been dealing with new and imaginative forms of invalid-ness. Michael C. Mattias Tal Systems Inc. Racine WI [Non-text portions of this message have been removed] ------------------------------------ ... Please use the following Message Identifiers as your subject prefix: <SALES>, <JOBS>, <LIST>, <TECH>, <MISC>, <EVENT>, <OFF-TOPIC> Job postings are welcome, but for job postings or requests for work: <JOBS> IS REQUIRED in the subject line as a prefix.Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/EDI-L/ <*> Your email settings: Individual Email | Traditional <*> To change settings online go to: http://groups.yahoo.com/group/EDI-L/join (Yahoo! ID required) <*> To change settings via email: [email protected] [email protected] <*> To unsubscribe from this group, send an email to: [email protected] <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/
