On 01/26/2018, at 20:56, Doug Lerner <[email protected] <mailto:[email protected]>> 
wrote:
> What I would like to do is find everything between the ID_User_ and *-find 
> (e.g. .5a82483a in this example) and be left with a file where each line 
> contains just that userId. After that I can sort it, remove duplicates, etc.
> 
> Is there a sequence of things I can do, using grep search patterns, and so 
> on, to create a file from this file containing just the userIds?


Hey Doug,

This is the sort of job Perl is very good at.


#!/usr/bin/env perl -sw
use v5.010;
use open qw(:std :utf8);
use utf8;
# ----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/02/04 22:00
# dMod: 2018/02/04 22:16 
# Task: Find a regular expression per line, sort, and remove duplicates.
# Tags: @Shell, @Script, @Find, @Regular, @Expression, @Per, @Line, @Sort, 
@Remove, @Duplicates
# ----------------------------------------------------------------

my (@Array, @Unique, %Hash, $Key);

while (<>) {
    if ( /ID_User=(.*?)-find/ ) {
        $Hash{$1} = 1;
    }
}

foreach $Key (sort keys %Hash) {
    say $Key;
}


The script processes a 100,000 line file in about a second on my old 2010 
MacBook Pro.


Now just for fun let's try that with a 1-line Bash script.


#!/usr/bin/env bash
LC_ALL='C'

sed -En '/ID_User=.*-find/{ s!ID_User=(.*)-find!\1!;p; }' | sort -u


Run either one of these as a BBEdit text-filter 
<http://bbeditextras.org/wiki/index.php?title=Text_Filters>.

If you want to keep the original data file then run on a copy.

--
Best Regards,
Chris

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or would like to report a problem, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/bbedit.

Reply via email to