Hi!
----
AFAIK we found a bug in the libast regex engine which manifests itself
when it should match&&capture text with '[' charcaters.
The following example (derived from Olga's previous work on a
quick&&dirty XML document scanner) shows the issue (note the "[TEXT]"
in variable "xmltext"):
-- snip --
xmltext='<h1><div> a text </div>More [TEXT].<!-- a comment
(<disabled>) --></h1>'
# parse
dummy="${xmltext//~(Ex)(?:
(<!--.+-->)+?| # xml comments
(<.+>)+?| # xml tags
([^[><]]+)+? # xml text
)/dummy}"
# debug output
printf 'dummy=%q\n' "${dummy}"
print -v .sh.match
# rebuild the original text, based on our matches
nameref nodes_all=.sh.match[0] # contains all matches
nameref nodes_comments=.sh.match[1] # contains only XML comment matches
nameref nodes_tags=.sh.match[2] # contains only XML tag matches
nameref nodes_text=.sh.match[3] # contains only XML text matches
integer i
for (( i = 0 ; i <= ${#nodes_all[@]} ; i++ )) ; do
[[ -v nodes_comments[i] ]] && printf '%s' "${nodes_comments[i]}"
[[ -v nodes_tags[i] ]] && printf '%s' "${nodes_tags[i]}"
[[ -v nodes_text[i] ]] && printf '%s' "${nodes_text[i]}"
done
printf '\n'
-- snip --
If I run the example i get the following output. First sign of trouble
is the '[' character in the "...dummydummy[dummy..." output. It looks
like the '[' wasn't simple matched by any of the patterns:
-- snip --
$ ./arch/sol11.i386\-64/bin/ksh xmlparse.sh
dummy='dummydummydummydummydummydummydummydummydummydummydummydummydummydummydummydummy[dummydummydummydummydummydummydummydummy'
(
(
[0]='<h1>'
[1]='<div>'
[2]=' '
[3]=a
[4]=' '
[5]=t
[6]=e
[7]=x
[8]=t
[9]=' '
[10]='</div>'
[11]=M
[12]=o
[13]=r
[14]=e
[15]=' '
[16]=T
[17]=E
[18]=X
[19]=T
[20]=']'
[21]=.
[22]='<!-- a comment (<disabled>) -->'
[23]='</h1>'
)
(
[22]='<!-- a comment (<disabled>) -->'
)
(
[0]='<h1>'
[1]='<div>'
[10]='</div>'
[23]='</h1>'
)
(
[2]=' '
[3]=a
[4]=' '
[5]=t
[6]=e
[7]=x
[8]=t
[9]=' '
[11]=M
[12]=o
[13]=r
[14]=e
[15]=' '
[16]=T
[17]=E
[18]=X
[19]=T
[20]=']'
[21]=.
)
)
<h1><div> a text </div>More TEXT].<!-- a comment (<disabled>) --></h1>
-- snip --
Glenn: What do you think ? It looks like that ([^[><]]+)+? does not
generate matches for '[', right ?
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) [email protected]
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers