Andrej Kastrin wrote:
Dear all,
Hello,
to pre-process my XML dataset in run simple Perl script on it, which
extract Id identifier from XML data and paste the whole XML record to
it. For example, the input data looks like:
<NoteSet>
<Note>
<Id>001</Id>
<To>Thomas</To>
<From>Joana</From>
</Note>
<Note>
<Id>002</Id>
<To>John</To>
<From>Paula</From>
</Note>
<Note>
<Id>003</Id>
<To>Andrew</To>
<From>Maria</From>
</Note>
</NoteSet>
and the desire output using the script should be:
001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
003 <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>
This should do what you want:
#!/usr/bin/perl
use warnings;
use strict;
my $FNI = shift;
my $FNO = "$FNI.dat";
open my $OUT, '>', $FNO or die "Cannot open '$FNO' $!";
open my $IN, '<', $FNI or die "Cannot open '$FNI' $!";
my ( $id, $line );
while ( <$IN> ) {
if ( m!<Note>! .. m!</Note>! ) {
( $id, $line ) = ( $1, '' ) if m!<Id>(\d+)</Id>!;
s/\A\s+//;
s/\s+\z//;
tr/\t/ /s; # more efficient than s/\t+/ /g
$line .= $_ if /Id|To|From/;
print $OUT "$id\t$line\n" if m!/Note!;
}
}
close $IN;
close $OUT;
But I can't figure why the script below omit the last record in the
input dataset, e.g.:
Your second while loop is eating up the third record without outputting
anything.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/