On Wednesday 15 August 2001 13:20, Joe Bellifont wrote:
> I have a file that looks like this
> ====
>
> <FNAME>joe</FNAME>
> <SURNAME>bloggs</BLOGGS>
> <QDETAILS> herein lies the question posed by the user
> the question can be multi-lined
> like this one.
> </QDETAILS>
>
> ======
>
> I'm trying to read the various tag content into variables:
> ==========
> sub ParseFile {
> my $file = 'submission6.xml';
>
>
> #opened the file i want to parse
> open(FH, $file) || die "cannot open file: $!";
> print "opening $file...........\n\n";
> #read contents into array
> my @stuff=<FH>;
> close(FH);
>
>
> #create one long string - why I'm not sure - but it worked with the regex
> below
> foreach my $stuff(@stuff) {
> $var=$var.$stuff;
> }
>
> my @details;
> # this grabs the text between <FNAME> and </FNAME>
> ($details[0])=$var=~/\<FNAME\>(.*)\<\/FNAME\>/;
>
> # this grabs the text between <SURNAME> and </SURNAME>
> ($details[1])=$var=~/\<SURNAME\>(.*)\<\/SURNAME\>/;
>
> #I want this top grab all the text between <QDETAILS> and </QDETAILS>
> -newline characters included.
> ($details[2])=$var=~/\<QDETAILS\>(.*)\<\/QDETAILS\>/;#
> #PROBLEM IS HERE==================^^^
> foreach $detail(@details) {
> print "$detail\n";
> }
>
> }
>
> ==========
> the regex for FNAME and SURNAME work fine. But I can't grab the text
> between <QDETAILS> and </QDETAILS> because
> of newline characters I think.
>
> Any other tips on how to improve my code generally?
Hi,
OK, so please bear with me, I am going to sound like an XML ayatollah ;--(
First you really should not call, or even imply (the file name ending with
.xml), that you are using XML when you are not: apart from </BOGGS> instead
of </SURNAME> which is obviously a typo your document is _not_ well-formed
XML: it misses a wrapping tag around the list of tags.
Then if you want to process XML, you should never, _never_, _NEVER_! do it
with regular expressions (OK, maybe there are cases where you can use
regexps, but they involve huge amounts of data, throw-away conversions and
generally knowing exactly what you are doing and why).
Use the parser luke!
There are just too many potential traps for you to write a robust XML::Parser
with regexps. Especially as there is an existing parser, plus a host of XML
modules that will make your life much easier.
In fact if you can install XML::Parser and XML::Simple on your system it will
be dead easy for you to get the values of the fields in a hash:
#!/bin/perl -w
use strict;
use XML::Simple; # depends on XML::Parser
use Data::Denter; # just to check what's read in by XML::Simple
my $data= XMLin( \*DATA); # read the data, you would use "./$file"
print Denter( $data), "\n"; # just checking
# yes, it's that simple!
foreach my $field qw( FNAME SURNAME QDETAILS)
{ print "$field: $data->{$field}\n"; }
__DATA__
<doc>
<FNAME>joe</FNAME>
<SURNAME>bloggs</SURNAME>
<QDETAILS> herein lies the question posed by the user
the question can be multi-lined
like this one.
</QDETAILS>
</doc>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]