Hello Brandon, hello Alexey and hello Jim,
I love this place - it is agreat place for newbies like me: @ Jim: i trim my replies and eliminate old material! Thx for the hints! @ Brandon: the path is really strange... but this was a very quick prove of concept Again: many thanks to all of you! in the meantime i provide some first results i have gained so far: you remeber the script that i have introduced further above: (see also below) i tried replacing the "in" to look in the same directory as the script assuming it's in the same directory ->in( '.' ); That means: i changed from ... #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('einzelergebnis*.html') ->in( '/home/usr/perl/htmlfiles' ); foreach my $file(@files) { print $file, "\n"; } to this: #!/usr/bin/perl use strict; use warnings; use diagnostics; use File::Find::Rule; my @files = File::Find::Rule->file() ->name('einzelergebnis*.html') ->in( '.' ); foreach my $file(@files) { print $file, "\n"; } and then i got the following output: htmlfiles/einzelergebnis80b5.html htmlfiles/einzelergebnisa0ef.html htmlfiles/einzelergebnis1b42.html htmlfiles/einzelergebnis5960.html htmlfiles/einzelergebnise523.html htmlfiles/einzelergebnis2c7e.html htmlfiles/einzelergebnisdf57.html htmlfiles/einzelergebnis2b53-2.html htmlfiles/einzelergebnisb1c0-2.html ....and 22 thousand lines further... ;-) This seems to be the starting point! now i can continue figuring out how i have to configure the script of - see more below. So after having nailed down the I-O handle-issues and the path names in General the parser-script (see below) has to be configured. All following ideas should be regarding this HTML-parser-script: (/below) Well, this means i have: a. to define the paths in $file the file/directory incl. path and furthermore ... b. to define a path in $html_dir In other words - i need to define the paths to a. the directory that contains the files that need to be parsed - see above. b. the path to the file that has to be created. The first task can be solved if i take some gained knowledge out of the preliminary-tasks - see above. That means: i have to look for the files in the directory that is called "htmlfiles" Does that mean i have to change this following line!? my $file = 'school.html'; BTW – what does the Array @html_files do? #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser; my $file = 'school.html'; my $p = HTML::TokeParser->new($file) or die "Can't open: $!"; my %school; while (my $tag = $p->get_tag('div', '/html')) { # first move to the right div that contains the information last if $tag->[0] eq '/html'; next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inhalt_large'; $p->get_tag('h1'); $school{'location'} = $p->get_text('/h1'); while (my $tag = $p->get_tag('div')) { last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'fusszeile'; # get the school name from the heading next unless exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'fm_linkeSpalte'; $p->get_tag('h2'); $school{'name'} = $p->get_text('/h2'); # verify format for school type $tag = $p->get_tag('span'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'schulart_text') { warn "unexpected format: parsing stopped"; last; } $school{'type'} = $p->get_text('/span'); # verify format for address $tag = $p->get_tag('p'); unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'einzel_text') { warn "unexpected format: parsing stopped"; last; } $school{'address'} = clean_address($p->get_text('/p')); # find the description $tag = $p->get_tag('p'); $school{'description'} = $p->get_text('/p'); } } print qq/$school{'name'}\n/; print qq/$school{'location'}\n/; print qq/$school{'type'}\n/; foreach (@{$school{'address'}}) { print "$_\n"; } print qq/\nDescription: $school{'description'}\n/; sub clean_address { my $text = shift; my @lines = split "\n", $text; foreach (@lines) { s/^\s+//; s/\s+$//; } return \...@lines; } Brandon, Alexey and Jim, i look forward to any and all help! I really appreciate a helping hand here... Many many thanks for all you did so far! This mailinglist is a great place for knowledge sharing!! BTW i can provide you with more information about the parser-job that has to be done. If this is whished just let me know! regards jobst aka floobee ___________________________________________________________ GRATIS: Spider-Man 1-3 sowie 300 weitere Videos! Jetzt kostenlose Movie-FLAT freischalten! http://movieflat.web.de -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/