Hello G G,
I've just got back from my summer holiday in Tuscany, hope to be in time
to answer your question! :)
First of all I suggest you to make your expressions simpler. The custom
character class [0-9], for example, could have been condensed in the
simpler \d shorthand class; the {0,1} item quantifier could also be
replaced by the simpler ? (question mark) construct.
To insert tabs in your output string just include them in the
replacement one! Assuming you are using C#, you could easily insert a \t
escape sequence inside your string, as the following example illustrates:
${AcctName}\t${Amt}\t${Dt2Pay}...
Then, the extra leading lines issue. The whole problem is given by the
fact that the first capture group (AcctName) would capture everything,
including white spaces. To solve this one you could, for example, force
at least one word to be captured by the group, replacing the first ^\w*
with a ^\w+ (that is a plus instead of a star). Another way could be to
just capture eventual white spaces at the beginning of every match and
discard them: to make so, just leave the original expression as it is
but prepend it with a (\s*)?? .
Please consider that I've tried to give you some useful hints to
accomplish what your are looking for while leaving your expression as
much untouched as possible; as I said before, your expression could be
improved and IMHO you should keep working on it.
Hope this helps.
--
Efran Cobisi
http://www.cobisi.com
G G wrote:
How can one avoid capturing leading empty or blank lines?
the data I deal with look like this
"will be paid on the dates you specified.
xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]
"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space
regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s*(?<RefNbr
\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s\w*)*)\s\[(?<FromAcctNbr>\d
*)\]
the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
what do I have to adjust in the regex expresiion?
thank you for your time and expertise
===================================
This list is hosted by DevelopMentor� http://www.develop.com
View archives and manage your subscription(s) at http://discuss.develop.com
===================================
This list is hosted by DevelopMentor� http://www.develop.com
View archives and manage your subscription(s) at http://discuss.develop.com