Dear all,
to pre-process my XML dataset in run simple Perl script on it, which
extract Id identifier from XML data and paste the whole XML record to
it. For example, the input data looks like:
<NoteSet>
<Note>
<Id>001</Id>
<To>Thomas</To>
<From>Joana</From>
</Note>
<Note>
<Id>002</Id>
<To>John</To>
<From>Paula</From>
</Note>
<Note>
<Id>003</Id>
<To>Andrew</To>
<From>Maria</From>
</Note>
</NoteSet>
and the desire output using the script should be:
001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
003 <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>
But I can't figure why the script below omit the last record in the
input dataset, e.g.:
001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
I'd appreciate any suggestions or pointers.
Best, Andrej
## test.pl ##
use strict;
my $FNI = shift;
my $FNO = "$FNI.dat";
my $started = 0;
my $chunk;
my @chunk;
open OUT, ">$FNO";
open IN, "$FNI";
while (<IN>) {
s/^\s+//g;
s/\s+$//g;
if (m/\<Note>/) {
if ($started) {
my $clob = join("", @chunk);
&process_chunk($clob);
} else {
$started = 1;
}
@chunk = ();
push (@chunk, $_);
while (1) {
$chunk = <IN>;
$chunk =~ s/^\s+//g;
$chunk =~ s/\s+$//g;
push (@chunk, $chunk);
last if ($chunk =~ m/\<\/Note>/);
}
}
}
close IN;
close OUT;
sub process_chunk {
my $clob = shift;
$clob =~ s/\t+/ /g;
my $id;
if ($clob =~ m/\<Id>(\d+)\<\/Id>/) {
$id = $1;
}
print OUT "$id\t$clob\n";
}
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/