Comments in line: On Sun, Jun 24, 2012 at 4:35 AM, ольга крыжановская <olga.kryzhanov...@gmail.com> wrote: > I already used testregex. Still no clue why it works there but fails in ksh. > > I've reworked the regex the expression below but it still does not > capture name="value". Full script uploaded to > http://pastebin.com/raw.php?i=SPyqBMrj > dummy="${xmltext//~(Ex-p)(?: > (<!--.+-->)+?| # xml comments > (<[:_-[:alnum:]]+ > (?: # attributes > [[:space:]]+ > (?:[:_-[:alnum:]]+=[^\"[:space:]]+?)| #x='foo=bar > huz=123' > (?:[:_-[:alnum:]]+=\"[^\"]*?\")| #x='foo="ba=r > o" huz=123' > (?:[:_-[:alnum:]]+=\'[^\']*?\')| #x="foo='ba=r > o' huz=123" > (?:[:_-[:alnum:]]+) #x="foox > huz=123"
These 4 brackets must be wrapped in a ( ) pair to clarify the | operator. Also all of Glenn's suggestions about ordering of characters in a range [...] must apply. Finally (?:[:_-[:alnum:]]+=[^\"[:space:]]+?) is wrong, it must be (?:[:_-[:alnum:]]+=[^\"\'[:space:]]+?), i.e. it had the \' missing. Otherwise it will compete with (?:[:_-[:alnum:]]+=\'[^\']*?\'). > )* > [[:space:]]* > \/? # start tags which are end tags, too (like <foo\/>) > >)+?| # xml start tags > (<\/[:_-[:alnum:]]+>)+?| # xml end tags > ([^><]+) # xml text > )/D}" I also have to say that whoever says XML is 'RISC alike' should be hit with some thing heavy. Correctly parsing it with all syntax variations is a hard job for a parser. I'd say parsers could be much smaller and faster if there are less syntax variations and exceptions. And do not get me started about the idea of having an alien syntax (DTD) in the middle of it. Olga -- , _ _ , { \/`o;====- Olga Kryzhanovska -====;o`\/ } .----'-/`-/ olga.kryzhanov...@gmail.com \-`\-'----. `'-..-| / http://twitter.com/fleyta \ |-..-'` /\/\ Solaris/BSD//C/C++ programmer /\/\ `--` `--` _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers