this is malformed [^[><]] its taken as 2 REs [^[><] and the literal char ] has to be specified like this [^]><[]
On Thu, 21 Jun 2012 02:07:20 +0200 Roland Mainz wrote: > AFAIK we found a bug in the libast regex engine which manifests itself > when it should match&&capture text with '[' charcaters. > The following example (derived from Olga's previous work on a > quick&&dirty XML document scanner) shows the issue (note the "[TEXT]" > in variable "xmltext"): > -- snip -- > xmltext='<h1><div> a text </div>More [TEXT].<!-- a comment > (<disabled>) --></h1>' > # parse > dummy="${xmltext//~(Ex)(?: > (<!--.+-->)+?| # xml comments > (<.+>)+?| # xml tags > ([^[><]]+)+? # xml text > )/dummy}" > # debug output > printf 'dummy=%q\n' "${dummy}" > print -v .sh.match > # rebuild the original text, based on our matches > nameref nodes_all=.sh.match[0] # contains all matches > nameref nodes_comments=.sh.match[1] # contains only XML comment matches > nameref nodes_tags=.sh.match[2] # contains only XML tag matches > nameref nodes_text=.sh.match[3] # contains only XML text matches > integer i > for (( i = 0 ; i <= ${#nodes_all[@]} ; i++ )) ; do > [[ -v nodes_comments[i] ]] && printf '%s' "${nodes_comments[i]}" > [[ -v nodes_tags[i] ]] && printf '%s' "${nodes_tags[i]}" > [[ -v nodes_text[i] ]] && printf '%s' "${nodes_text[i]}" > done > printf '\n' > -- snip -- > If I run the example i get the following output. First sign of trouble > is the '[' character in the "...dummydummy[dummy..." output. It looks > like the '[' wasn't simple matched by any of the patterns: > -- snip -- > $ ./arch/sol11.i386\-64/bin/ksh xmlparse.sh > dummy='dummydummydummydummydummydummydummydummydummydummydummydummydummydummydummydummy[dummydummydummydummydummydummydummydummy' > ( > ( > [0]='<h1>' > [1]='<div>' > [2]=' ' > [3]=a > [4]=' ' > [5]=t > [6]=e > [7]=x > [8]=t > [9]=' ' > [10]='</div>' > [11]=M > [12]=o > [13]=r > [14]=e > [15]=' ' > [16]=T > [17]=E > [18]=X > [19]=T > [20]=']' > [21]=. > [22]='<!-- a comment (<disabled>) -->' > [23]='</h1>' > ) > ( > [22]='<!-- a comment (<disabled>) -->' > ) > ( > [0]='<h1>' > [1]='<div>' > [10]='</div>' > [23]='</h1>' > ) > ( > [2]=' ' > [3]=a > [4]=' ' > [5]=t > [6]=e > [7]=x > [8]=t > [9]=' ' > [11]=M > [12]=o > [13]=r > [14]=e > [15]=' ' > [16]=T > [17]=E > [18]=X > [19]=T > [20]=']' > [21]=. > ) > ) > <h1><div> a text </div>More TEXT].<!-- a comment (<disabled>) --></h1> > -- snip -- > Glenn: What do you think ? It looks like that ([^[><]]+)+? does not > generate matches for '[', right ? > ---- > Bye, > Roland > -- > __ . . __ > (o.\ \/ /.o) roland.ma...@nrubsig.org > \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer > /O /==\ O\ TEL +49 641 3992797 > (;O/ \/ \O;) _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers