On Mon, Jun 23, 2003 at 10:43:07AM +0200 Denham Eva wrote:

> I am very much a novice at perl and probably bitten off more than I can chew
> here. 
> I have a file, which is a dump of a database - so it is a fixed file format.
> The problem is that I am struggling to manipulate it correctly. I have been
> trying for two days now to get a program to work. The idea is to remove the
> duplicate records, ie a record begins with Name and ends with Values End.
> The program that I have thus far, is  pathetic in the sense I have opened
> three files, the file below, a data file for cleaned data, and a file for
> capturing the usernames already processed. But I have got stuck on how to
> compare and work through the file line for line and then only to capture the
> lines that are not duplicated.

Keeping a couple of files around is not necessarily pathetic. I think
you don't need a file for the processed usernames. But the original file
and one for the processed data is a totally common pattern.

> Here is the file format....
> 
> <File Begins>
> #DB dumped
> #DB version 8.0
> #SW version 2.6(1.10)
> #---------------------------------------------------------------------------
> --
> Name          :       system
> Some stuff here... 
> many lines....
> Of different format... 
> such as line below...
> User Count    :       0
> ##--- User End
> Lots of text here...
> Until...
> We get line below...
> ##--- Values End
> #---------------------------------------------------------------------------
> --

So, "#-----..." is essentially the record separator? A fixed separator
is good because it makes processing rather easy. It might be handy to
both set the input record separator to this value:

#! /usr/bin/perl -w

use strict;

local $/ = 
"#-----------------------------------------------------------------------------\n";

open IN, "old_database" or die $!;
open OUT, ">new_database" or die $!;

# keep track of what records have already been seen
my %records_seen;

# this is the 'header', that is: what is before the first record
print OUT scalar <IN>;  

while (<IN>) {
    if (/Name\s+:\s+(\S+)/) {
        #            ^^^
        # $1 is record name
        next if $records_seen{ $1 }++;
        print OUT $_;
    }
}

print OUT "#End Of Dump\n";

close IN;
close OUT;

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to