I don't have code to do what I want, but here's the pieces I'm trying to
string together:

Abbreviation dictionary consists of a file like this:

SRED. SREDNE
SEV.  SEVERN
etc.

Each abbreviation is turned into four regexes, like this (doubtless they
could be made more efficient, but they work well enough at present):

# Sred. = SREDNE
$cgname =~ s/^SRED\.(?=[\W\s\-\d]+)/SREDNE:/g ;               # Match it at
beginning of line
$cgname =~ s/[\W\s\-]+SRED\.(?=[\W\s\-\d]+)/:SREDNE:/g ;      # Match it
within the line
$cgname =~ s/[\W\s\-]+SRED\.$/:SREDNE:/g ;              # Match it at end
of line
$cgname =~ s/^SRED\.$/:SREDNE:/g ;                      # Match if it
begins & ends line

# Sev.  = SEVERN
$cgname =~ s/^SEV\.(?=[\W\s\-\d]+)/SEVERN:/g ;                # Match it at
beginning of line
$cgname =~ s/[\W\s\-]+SEV\.(?=[\W\s\-\d]+)/:SEVERN:/g ;       # Match it
within the line
$cgname =~ s/[\W\s\-]+SEV\.$/:SEVERN:/g ;               # Match it at end
of line
$cgname =~ s/^SEV\.$/:SEVERN:/g ;                       # Match if it
begins & ends line

etc.

Right now I'm generating the regexes in a standalone script, then inserting
the output code into the subroutine that processes names into a "matchable"
form.

What I'd like to be able to do is take a *set* of abbreviation
"dictionaries," concatenate them together and dynamically generate the
regex code in the routine that is going to execute it.

Thanks,

Scott

Scott E. Robinson
SWAT Team
UTC Onsite User Support
RR-690 -- 281-654-5169
EMB-2813N -- 713-656-3629


                                                                                       
                                        
                      "David Kirol"                                                    
                                        
                      <[EMAIL PROTECTED]        To:      <[EMAIL PROTECTED]>           
                          
                      >                         cc:                                    
                                        
                                                Subject:       Re: There has to be a 
way to do this                            
                                                                                       
                                        
                      06/20/03 08:38 PM                                                
                                        
                                                                                       
                                        
                                                                                       
                                        



Scott,
             Sounds like a fun problem. Can you post some code and an
(abbreviated) set
of example data?
David

"Scott E Robinson" <[EMAIL PROTECTED]> wrote in message
news:<[EMAIL PROTECTED]>...
> I'm still working on the well-name matching program that I've brought up
> here before.  I've received invaluable help to solve the toughest
questions
> in its development, for which I'm very grateful.
>
> Now I'm trying to automate some steps which were previously manual in the
> process, to make it more end-user-friendly.  There has to be a way to do
> this with Perl.
>
> The script uses a "dictionary" of abbreviations to aid its matching.  The
> abbreviations are implemented as a series of substitutions with the "s"
> operator.  I have a Perl script which builds the substitution statements
> from a tab-delimited list of abbreviations and their equivalent long
forms.
> I then manually insert these statements into the subroutine that uses
them.
>
> I kept the abbreviation translation hardcoded into the subroutine for
> performance reasons (this thing compares 14,000 unknown well names
against
> 680,000 match candidates).  Is there a way in Perl to read the
abbreviation
> dicitionary (the tab-delimited list), generate the code, insert it into
the
> right subroutine, and start executing the program, all in one script?
> (Maybe you can tell me that the performance hit from using variables in
the
> substitution statements is negligible, and if so, I'd be happy to go that
> route.)
>
> Thanks in advance,
>
> Scott
>
> Scott E. Robinson
> Data SWAT Team
> UTC Onsite User Support
> RR-690 -- 281-654-5169
> EMB-2813N -- 713-656-3629
>






-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to