Addenda: 1. I'm now assuming that the goal is to process UTF-8 encoded input. I failed to say so in the previous post, but given that input files are specified as UTF-8, it seems irredeemably silly to first expand them to UCS2 and then contract them for regular expression purposes. In short, both Java and C# have input processing on text files hopelessly borked.
2. The main issue to decide in the debate between ONMATCH and CHAR c /pc/ would appear to be constraints on rewriting. I'm perfectly comfortable with declaring that there are bracketing constraints on bytecode, e.g. that an opening ONMATCH must be bracketed by FAIL. I'm also comfortable with saying that the "scope" of an ONMATCH is lexical, in the sense that a JMP instruction that exits the ONMATCH/FAIL pair has the side effect of setting the contextually prevailing match to "undefined", with the implication that a successfully matching CHAR instruction results in an exceptional outcome. Both constraints can reasonably be maintained by an NFA->DFA converter. Finally, I'm comfortable declaring that (as a well-formedness constraint) ONMATCH/FAIL brackets may not nest. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
