On Mon, Jun 23, 2003 at 10:43:07AM +0200 Denham Eva wrote: > I am very much a novice at perl and probably bitten off more than I can chew > here. > I have a file, which is a dump of a database - so it is a fixed file format. > The problem is that I am struggling to manipulate it correctly. I have been > trying for two days now to get a program to work. The idea is to remove the > duplicate records, ie a record begins with Name and ends with Values End. > The program that I have thus far, is pathetic in the sense I have opened > three files, the file below, a data file for cleaned data, and a file for > capturing the usernames already processed. But I have got stuck on how to > compare and work through the file line for line and then only to capture the > lines that are not duplicated.
Keeping a couple of files around is not necessarily pathetic. I think you don't need a file for the processed usernames. But the original file and one for the processed data is a totally common pattern. > Here is the file format.... > > <File Begins> > #DB dumped > #DB version 8.0 > #SW version 2.6(1.10) > #--------------------------------------------------------------------------- > -- > Name : system > Some stuff here... > many lines.... > Of different format... > such as line below... > User Count : 0 > ##--- User End > Lots of text here... > Until... > We get line below... > ##--- Values End > #--------------------------------------------------------------------------- > -- So, "#-----..." is essentially the record separator? A fixed separator is good because it makes processing rather easy. It might be handy to both set the input record separator to this value: #! /usr/bin/perl -w use strict; local $/ = "#-----------------------------------------------------------------------------\n"; open IN, "old_database" or die $!; open OUT, ">new_database" or die $!; # keep track of what records have already been seen my %records_seen; # this is the 'header', that is: what is before the first record print OUT scalar <IN>; while (<IN>) { if (/Name\s+:\s+(\S+)/) { # ^^^ # $1 is record name next if $records_seen{ $1 }++; print OUT $_; } } print OUT "#End Of Dump\n"; close IN; close OUT; Tassilo -- $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({ pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#; $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]