Dear all,

to pre-process my XML dataset in run simple Perl script on it, which extract Id identifier from XML data and paste the whole XML record to it. For example, the input data looks like:

<NoteSet>
    <Note>
        <Id>001</Id>
        <To>Thomas</To>
        <From>Joana</From>
    </Note>
    <Note>
        <Id>002</Id>
        <To>John</To>
        <From>Paula</From>
    </Note>
    <Note>
        <Id>003</Id>
        <To>Andrew</To>
        <From>Maria</From>
    </Note>
</NoteSet>

and the desire output using the script should be:

001     <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002     <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
003     <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>

But I can't figure why the script below omit the last record in the input dataset, e.g.:

001     <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002     <Note><Id>002</Id><To>John</To><From>Paula</From></Note>

I'd appreciate any suggestions or pointers.
Best, Andrej


## test.pl ##
use strict;
my $FNI = shift;
my $FNO = "$FNI.dat";
my $started = 0;
my $chunk;
my @chunk;

open OUT, ">$FNO";
open IN, "$FNI";
while (<IN>) {
        s/^\s+//g;
        s/\s+$//g;
        if (m/\<Note>/) {
                if ($started) {
                        my $clob = join("", @chunk);
                        &process_chunk($clob);
                } else {
                        $started = 1;
                }
                @chunk = ();
                push (@chunk, $_);
                while (1) {
                        $chunk = <IN>;
                        $chunk =~ s/^\s+//g;
                        $chunk =~ s/\s+$//g;
                        push (@chunk, $chunk);
                        last if ($chunk =~ m/\<\/Note>/);
                }
        }
}
close IN;
close OUT;

sub process_chunk {
        my $clob = shift;
        $clob =~ s/\t+/ /g;
        my $id;
        if ($clob =~ m/\<Id>(\d+)\<\/Id>/) {
                $id = $1;
        }
        print OUT "$id\t$clob\n";
}


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to