I've been trying to use regex for fast parsing of short xml fragments.
The first attempt went quite well:
----------cutme-----------
$ cat xmlfragparse.sh
xmltext='<h1><div> a text </div></h1>'
dummy="${xmltext//~(Ex-g)(?:
(<.+>)|
([^[><]]+)
)/dummy}"
print -v .sh.match
$ ksh xmlfragparse.sh
(
(
[0]='<h1>'
[1]='<div>'
[2]=' '
[3]=a
[4]=' '
[5]=t
[6]=e
[7]=x
[8]=t
[9]=' '
[10]='</div>'
[11]='</h1>'
)
(
[0]='<h1>'
[1]='<div>'
[10]='</div>'
[11]='</h1>'
)
(
[2]=' '
[3]=a
[4]=' '
[5]=t
[6]=e
[7]=x
[8]=t
[9]=' '
)
)
----------cutme-----------
This is all OK.
However if I try to add support for XML comments the hell breaks loose:
----------cutme-----------
$ cat xmlfragparsecomment.sh
xmltext='<h1><div> a text </div><!-- a comment (<disabled>) --></h1>'
dummy="${xmltext//~(Ex-g)(?:
(<[^[!]].+>)| # xml tags
([^[><]]+)| # xml text
(<!--.+-->) # xml comments
)/dummy}"
print -v .sh.match
$ ksh xmlfragparsecomment.sh
(
(
[0]=h
[1]=1
[2]=d
[3]=i
[4]=v
[5]=' '
[6]=a
[7]=' '
[8]=t
[9]=e
[10]=x
[11]=t
[12]=' '
[13]=/
[14]=d
[15]=i
[16]=v
[17]='<!-- a comment (<disabled>) -->'
[18]=/
[19]=h
[20]=1
)
(
[0]=
)
(
[0]=h
[1]=1
[2]=d
[3]=i
[4]=v
[5]=' '
[6]=a
[7]=' '
[8]=t
[9]=e
[10]=x
[11]=t
[12]=' '
[13]=/
[14]=d
[15]=i
[16]=v
[18]=/
[19]=h
[20]=1
)
(
[17]='<!-- a comment (<disabled>) -->'
)
)
----------cutme-----------
This is all wrong. I changed (<.+>) to match tags in the original to
(<[^[!]].+>) to prevent the subpattern to match xml comments, i.e.
<!-- a comment --> and added a separate subpattern for these comments,
i.e. (<!--.+-->) but it does not work. Tags are no more matched, look
at [0]= in the output.
Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ [email protected] \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users