Re: [Unicon-group] Unicon regex string pattern constructor

Jeffery, Clint (jeffe...@uidaho.edu) Mon, 13 Mar 2017 18:19:52 -0700

Hi Bruce,


I am leery of conducting such discussions on the mail unicon group, they would 
be better on unicon-ldif or on private e-mail. So I will send you answer by 
private e-mail. If others want to hear it, let me know and subscribe to 
unicon-ldif and I will answer it there.


Clint

________________________________
From: Bruce & Breeanna Rennie <bren...@dcsi.net.au>
Sent: Monday, March 13, 2017 5:33:21 PM
To: Contact - clint.jeff...@gmail.com; Unicon group
Cc: homem...@talktalk.net; Phillip Thomas
Subject: Re: [Unicon-group] Unicon regex string pattern constructor

Good morning Clinton et al,

As I am one of those trying to replace the lexer for regular
expressions, I had a thought this morning about using Robert Alexander's
rexexp.icn as a base for your process and for handing the regex patterns
in the unicon compiler.

The code that I was going to use to switch lexers when coming across <
and > is still being worked on. However, I am thinking of taking a
little detour to see what can be used from the RePat procedure to create
the alternate lexer inside the compiler.

My question to you Clinton and anyone else that  has any ideas regarding
this - what effects would trouble users when using this facility in
conjunction with the existing pattern matching functions available
(other than the one Clinton has given of SNOBOL being shortest and not
longest first).

Just an idea, I have not looked at the interaction just yet. Just had
the idea when I got home with my wife from downtown.

regards

Bruce Rennie

On 01/03/17 07:01, Clinton Jeffery wrote:
> Hi,
>
> I am working on a library module that converts a string (containing a
> regular expression) into its corresponding pattern. This is the
> runtime equivalent of what the Unicon translator does when it sees a
> regular expression, except Unicon writes out pattern constructor
> source code as strings, rather than actually construct the pattern.
>
> The easy part turned out to be fairly easy: ripping the regular
> expression grammar out of the Unicon grammar into a separate .y file
> was pretty easy.
> Wiring it up to Unicon's lexical analyzer was pretty easy since
> unilex.u is already in uni/lib, as is tree.u for constructing parse
> trees.  Slightly harder is the part I haven't done yet, which is:
> given a regular expression parse tree, walk the parse tree and build
> the pattern. The reason this is harder, at present, is that Unicon's
> code generator for regular expressions takes a lot of short cuts,
> emitting strings that will work as source code to construct various
> patterns, instead of properly building parse tree nodes everywhere. A
> by-product of this effort is likely to be: eliminating a bunch of
> short cuts and building tree nodes instead of strings for various things.
>
> There is a separate issue which has bothered one or more of you enough
> that you have worked on it or looked into working on it, which is that
> our semantics of regular expressions are at present constrained by
> Unicon's lexical rules. Spaces are spaces, pound signs are comments,
> etc. If you really hate that, you write a new lexical analyzer for
> regular expressions that you switch into while parsing them, and
> switch back when you are finished.  For now, I will treat that as a
> separable issue and ignore it, but we will likely return to it.
> Perhaps one of you will send me an amazing new unilex.icn that we all
> will agree works better for its regex handling.
>
> Another separate issue, which I think is more serious, is that SNOBOL
> pattern matching returns the shortest match, not the longest match, so
> using SNOBOL patterns to implement regular expressions will produce
> counterintuitive results by default for folks used to using regular
> expressions in other tools. But again, I think this is a separate
> issue, one we can address if/when we feel it necessary.
>
> To sum up: I think it won't be too awful hard to make a library module
> to build a pattern from a string containing a regular expression. That
> won't solve all the current regular expressions' shortcomings, but it
> will make it easier to do certain experiments and make incremental
> progress, and perhaps it will allow a closer to head-to-head
> comparison with the IPL regexp.icn library.
>
> IT IS NOT FINISHED, but I am attaching my work-in-progress for your
> amusement. Please do not redistribute, but feel free to send me fixes.
> If you are interested in following along before I finish and put it in
> svn, please let me know and I will send you updates, otherwise feel
> free to wait.
>
> This version has a main() for testing purposes. It doesn't actually do
> anything useful yet. The makefile rule assumes build directory is one
> of the uni/ subdirectories.
>
> rel: rel.y
>     ../iyacc/iyacc -i rel.y
>     unicon rel
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Unicon-group mailing list
Unicon-group@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/unicon-group

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Unicon-group mailing list
Unicon-group@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] Unicon regex string pattern constructor

Reply via email to