On Thu, Oct 27, 2011 at 01:31:40PM +0200, Roland Küffner wrote:

> My idea was to do it with some kind of dictionary file. In it each line would 
> contain a single search replacement pair separated by tabs. Just like:
> 
> old term<tab>new term
> some other random old text<tab>another replacement
> ...

I would take a slightly different approach from Bruce's, because I see some
drawbacks to doing a separate search/replace for each term.

First, I think it could be much slower if you have a lot of text and a big
dictionary.

Second, you could end up modifying the same piece of text multiple times,
depending on the order in which the search & replaces happen.  Further,
when that order is based on keys in a hash, you can't reliably predict
which outcome you'll get.


For example, if you have these entries in your dictionary:

house<tab>home
my home<tab>where I live

with the text 'my house', the result could be either 'where I live' or 'my
home'.


Or if you had these entries in your dictionary:

dog<tab>cat
cat<tab>dog

with the text 'dog cat', the result would be either 'dog dog' or 'cat cat'
but not the desired 'cat dog'.


So, instead I would create a single regex that matches all the old terms,
sorted in descending order by length, in case one term is a prefix of
another.

#!perl

use strict;

my %dict;

while (<DATA>) {
  chomp;
  /\t/ or next;
  my ($old, $new) = split /\t/, $_;
  $dict{$old} = $new;
}

my $re =
  '\b(' . join('|', sort { length $b <=> length $a } keys %dict) . ')\b';

while (<>) {
  s/$re/$dict{$1}/g;
  print;
}

__END__
house   home
my home where I live

dog     cat
cat     dog


Here I'm loading the dictionary with a while loop, but Bruce's map approach
is perfectly fine as well.  I also like his suggestion to use a colon with
optional spaces instead of a tab; that way you can line up the terms in
nice columns.


Ronald

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem, 
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

Reply via email to