On Thu, Oct 27, 2011 at 01:31:40PM +0200, Roland Küffner wrote:
> My idea was to do it with some kind of dictionary file. In it each line would
> contain a single search replacement pair separated by tabs. Just like:
>
> old term<tab>new term
> some other random old text<tab>another replacement
> ...
I would take a slightly different approach from Bruce's, because I see some
drawbacks to doing a separate search/replace for each term.
First, I think it could be much slower if you have a lot of text and a big
dictionary.
Second, you could end up modifying the same piece of text multiple times,
depending on the order in which the search & replaces happen. Further,
when that order is based on keys in a hash, you can't reliably predict
which outcome you'll get.
For example, if you have these entries in your dictionary:
house<tab>home
my home<tab>where I live
with the text 'my house', the result could be either 'where I live' or 'my
home'.
Or if you had these entries in your dictionary:
dog<tab>cat
cat<tab>dog
with the text 'dog cat', the result would be either 'dog dog' or 'cat cat'
but not the desired 'cat dog'.
So, instead I would create a single regex that matches all the old terms,
sorted in descending order by length, in case one term is a prefix of
another.
#!perl
use strict;
my %dict;
while (<DATA>) {
chomp;
/\t/ or next;
my ($old, $new) = split /\t/, $_;
$dict{$old} = $new;
}
my $re =
'\b(' . join('|', sort { length $b <=> length $a } keys %dict) . ')\b';
while (<>) {
s/$re/$dict{$1}/g;
print;
}
__END__
house home
my home where I live
dog cat
cat dog
Here I'm loading the dictionary with a while loop, but Bruce's map approach
is perfectly fine as well. I also like his suggestion to use a colon with
optional spaces instead of a tab; that way you can line up the terms in
nice columns.
Ronald
--
You received this message because you are subscribed to the
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem,
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>