Below is an example solution. I attached the same file, in case your
mail application destroys the tabs:
---------------------------------------
xmltext='<h1><div> a text </div><!-- a comment (<disabled>) --></h1>'
# parse
dummy="${xmltext//~(Ex)(?:
(<!--.+-->)+?| # xml comments
(<.+>)+?| # xml tags
([^[><]]+)+? # xml text
)/dummy}"
# debug output
print -v .sh.match
# rebuild the original text, based on our matches
nameref nodes_all=.sh.match[0] # contains all matches
nameref nodes_comments=.sh.match[1] # contains only XML comment matches
nameref nodes_tags=.sh.match[2] # contains only XML tag matches
nameref nodes_text=.sh.match[3] # contains only XML text matches
integer i
for (( i = 0 ; i <= ${#nodes_all[@]} ; i++ )) ; do
[[ -v nodes_comments[i] ]] && printf '%s' "${nodes_comments[i]}"
[[ -v nodes_tags[i] ]] && printf '%s' "${nodes_tags[i]}"
[[ -v nodes_text[i] ]] && printf '%s' "${nodes_text[i]}"
done
printf '\n'
---------------------------------------
Credits go to Roland Mainz and Glenn Fowler to find my problems and fix them.
Olga
On Mon, Jun 18, 2012 at 3:38 PM, ольга крыжановская
<[email protected]> wrote:
> I've been trying to use regex for fast parsing of short xml fragments.
> The first attempt went quite well:
> ----------cutme-----------
> $ cat xmlfragparse.sh
> xmltext='<h1><div> a text </div></h1>'
>
> dummy="${xmltext//~(Ex-g)(?:
> (<.+>)|
> ([^[><]]+)
> )/dummy}"
> print -v .sh.match
> $ ksh xmlfragparse.sh
> (
> (
> [0]='<h1>'
> [1]='<div>'
> [2]=' '
> [3]=a
> [4]=' '
> [5]=t
> [6]=e
> [7]=x
> [8]=t
> [9]=' '
> [10]='</div>'
> [11]='</h1>'
> )
> (
> [0]='<h1>'
> [1]='<div>'
> [10]='</div>'
> [11]='</h1>'
> )
> (
> [2]=' '
> [3]=a
> [4]=' '
> [5]=t
> [6]=e
> [7]=x
> [8]=t
> [9]=' '
> )
> )
> ----------cutme-----------
>
> This is all OK.
>
>
> However if I try to add support for XML comments the hell breaks loose:
>
> ----------cutme-----------
> $ cat xmlfragparsecomment.sh
>
> xmltext='<h1><div> a text </div><!-- a comment (<disabled>) --></h1>'
>
> dummy="${xmltext//~(Ex-g)(?:
> (<[^[!]].+>)| # xml tags
> ([^[><]]+)| # xml text
> (<!--.+-->) # xml comments
> )/dummy}"
> print -v .sh.match
> $ ksh xmlfragparsecomment.sh
> (
> (
> [0]=h
> [1]=1
> [2]=d
> [3]=i
> [4]=v
> [5]=' '
> [6]=a
> [7]=' '
> [8]=t
> [9]=e
> [10]=x
> [11]=t
> [12]=' '
> [13]=/
> [14]=d
> [15]=i
> [16]=v
> [17]='<!-- a comment (<disabled>) -->'
> [18]=/
> [19]=h
> [20]=1
> )
> (
> [0]=
> )
> (
> [0]=h
> [1]=1
> [2]=d
> [3]=i
> [4]=v
> [5]=' '
> [6]=a
> [7]=' '
> [8]=t
> [9]=e
> [10]=x
> [11]=t
> [12]=' '
> [13]=/
> [14]=d
> [15]=i
> [16]=v
> [18]=/
> [19]=h
> [20]=1
> )
> (
> [17]='<!-- a comment (<disabled>) -->'
> )
> )
> ----------cutme-----------
>
> This is all wrong. I changed (<.+>) to match tags in the original to
> (<[^[!]].+>) to prevent the subpattern to match xml comments, i.e.
> <!-- a comment --> and added a separate subpattern for these comments,
> i.e. (<!--.+-->) but it does not work. Tags are no more matched, look
> at [0]= in the output.
>
> Olga
> --
> , _ _ ,
> { \/`o;====- Olga Kryzhanovska -====;o`\/ }
> .----'-/`-/ [email protected] \-`\-'----.
> `'-..-| / http://twitter.com/fleyta \ |-..-'`
> /\/\ Solaris/BSD//C/C++ programmer /\/\
> `--` `--`
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ [email protected] \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
xmltext='<h1><div> a text </div><!-- a comment (<disabled>) --></h1>'
# parse
dummy="${xmltext//~(Ex)(?:
(<!--.+-->)+?| # xml comments
(<.+>)+?| # xml tags
([^[><]]+)+? # xml text
)/dummy}"
# debug output
print -v .sh.match
# rebuild the original text, based on our matches
nameref nodes_all=.sh.match[0] # contains all matches
nameref nodes_comments=.sh.match[1] # contains only XML comment matches
nameref nodes_tags=.sh.match[2] # contains only XML tag matches
nameref nodes_text=.sh.match[3] # contains only XML text matches
integer i
for (( i = 0 ; i <= ${#nodes_all[@]} ; i++ )) ; do
[[ -v nodes_comments[i] ]] && printf '%s' "${nodes_comments[i]}"
[[ -v nodes_tags[i] ]] && printf '%s' "${nodes_tags[i]}"
[[ -v nodes_text[i] ]] && printf '%s' "${nodes_text[i]}"
done
printf '\n'
_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users