Re: Parsing large XML file - Revisited

Srikanth Thu, 26 Jul 2007 00:33:17 -0700

On Jul 25, 9:11 pm, [EMAIL PROTECTED] (Mike Blezien) wrote:
> Rob,
>
> ----- Original Message -----
> From: "Rob Dixon" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: "Mike Blezien" <[EMAIL PROTECTED]>
> Sent: Wednesday, July 25, 2007 10:57 AM
> Subject: Re: Parsing large XML file - Revisited
>
> > Mike Blezien wrote:
>
> >> Mirod wrote:
>
> >>> On Jul 22, 3:33 am, [EMAIL PROTECTED] (Dr.Ruud) wrote:
>
> >>>> "Mike Blezien" schreef:
>
> >>>> >   my $article_number = $elt->first_child_text('article_number');
> >>>> >   my $dist_number    = $elt->first_child_text('distributor_number');
> >>>> >   my $dist_name      = $elt->first_child_text('distributor_name');
> >>>> >   my $artist         = $elt->first_child_text('artist');
> >>>> >   my $ean_upc        = $elt->first_child_text('ean_upc');
> >>>> >   my $set_total      = $elt->first_child_text('set_total');
>
> >>>> That looks awful. Isn't there some way with the module to do it cleaner?
>
> >>>> Or do it more like:
>
> >>>>   my @text_tags = qw(article_number distributor_number etc);
> >>>>   my %data;
>
> >>>>   for my $tag (@text_tags) {
> >>>>       $data{_text}{$tag} = $elt->first_child_text($tag);
> >>>>   }
>
> >>> just a quick note, that first_child_text can also be written field,
> >>> which often makes more sense in a data oriented context.
>
> >> that was an excellent idea :) Alot cleaner and alot less coding involved.
> >> Still fairly new working with XML parsing.
>
> > Hi Mike
>
> > Using a shorter synonym for a method isn't a significant improvement. I 
> > prefer
> > the 'first_child_text' name as it is more descriptive, and if I was using
> > exactly the code above I would rewrite it as:
>
> >  my ($article_number, $dist_number, $dist_name, $artist, $ean_upc, 
> > $set_total)
> > = map {
> >    $elt->first_child_text($_)
> >  } qw/article_number distributor_number distributor_name artist ean_upc
> > set_total/
>
> > But I made no changes to your code apart from to correct the semantics as it
> > wasn't
> > at all obvious what you're doing. The code you posted just extracts XML 
> > field
> > text
> > values into a number of lexical variables and then discards them. If you 
> > give
> > us an
> > idea what your final intention is then I'm sure we can help, and it probably
> > won't
> > involve using 'field' instead of 'first_child_text'; but it is likely that a
> > hash
> > structure would be more appropriate.
>
> > As I mentioned in an earlier post, it's important to separate XML nodes from
> > their
> > textual content. XML::Twig methods return both types of data and it's best 
> > not
> > to mix
> > them up. More importantly, you can always extract the text data value from 
> > an
> > XML node,
> > but the reverse isn't true.
>
> Obviously there are several approaches to accomplish this task. With the help 
> of
> yourself and others who posted, I have been able to put together a fairly
> efficient script, as we need to process & parse approx., 5000+ XML files
> averaging 9-1000KB's in size. So far it has been working smoothly :)
>
> Mike



Hi,
My requirement is to compare two xml(large[50MB] each) files and
generate an xml file with differences(xmldelta). But here the problem
is the modules(XML::Diff) which I installed are taking lot of
memory(even 2 GB RAM is not sufficient) and time. I am thinking that
those modules are using XML::Parser which in turn uses DOM Parser so
that taking lot of memory.
Is there any way in perl which will do that using SAX Parser? or which
will take less memory?
Please help in this regard.

Thanks in advance.

Regards,
L.Srikanth Kumar



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Parsing large XML file - Revisited

Reply via email to