On Sat, Oct 01, 2011 at 03:07:09PM -0700, archaeal wrote:
> Hello all,
> I would like to identify or eliminate pairs of "words" from different
> lines.
>
> An example (all words are seperated by a tab:
>
> 53_G16I9RF01EUP2C 53_G16I9RF02JZUJU
> 53_G16I9RF02JZUJU 53_G16I9RF01EUP2C
> 53_G16I9RF02JZV1E 33_G0JCAX402GV9YC
> 53_G16I9RF02JZV1E 33_G16I9RF02FOVF0
> or:
> A B
> B A
> C D
> E F
>
> Line one and two contains the same words but in inverted order. I
> would like to eliminate one of these "duplicates". I thought it could
> work with process duplicate lines with: [a-z0-9_]{17}\t[a-z0-9_]{17}
> but this didn't work.
> I would be glad if someone could help me out with this. Perhaps there
> is a more simple way to do this
Process Duplicate Lines allows you to specify parts of the lines to compare
using a pattern, but it doesn't reorder the parts when finding duplicates.
For example, if your pattern is "(A) . (B)", and you have "All
sub-patterns" checked, these lines are duplicates:
A x B
A y B
but these lines are not:
A x B
B y A
because AB is not the same as BA.
To find the duplicates, you should first convert each line to a canonical
form. In this case, you could sort the words on each line.
Here's a Perl script that does it, which you can use in BBEdit as a Unix
Filter:
#!perl -ln
print unless $seen{join "\t", sort split /\t/}++;
__END__
It splits each input line on tabs, sorts the words, joins them back
together with tabs, and increments the tally. The first time a particular
set of words is seen, the original line is printed.
Or, you could print the canonical form of the line instead:
#!perl -ln
$canonical = join "\t", sort split /\t/;
print $canonical unless $seen{$canonical}++;
__END__
Ronald
--
You received this message because you are subscribed to the
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem,
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>