On Sat, 2022-07-23 at 13:03 -0600, William Torrez Corea wrote:
> My goal: I want to create something similar to the phone guide. In
> this
> page exist a great number of documents in format pdf. So, I want to
> unite
> the different documents and can filter for name, last name, location.
> If I
> make this manually I have to open each document, download the
> document and
> search the name of the person manually.
> 
> The documents uploaded in this page are different: year, date. They
> contain
> different information.
> 

Perhaps you can automate the downloading and then use tools that merge
PDF files, like pdfunite, to turn them into a single PDF.  There's also
pdf2txt that can extract text from a PDF --- of course, that would only
work if there were a way to detect which information is what.

Since we do not have all the PDF files which apparently are all
different, we can't tell how it might be possible to detect which
information is what.

I wouldn't even bother with this because PDF is awful to get
information from automatically.  Whoever makes these PDF files needs to
provide the information in such a way that it is usable.  Since you
need to download all the files anyway to search for a name, you're
better off merging them into a single file and search that in your
favourite PDF viewer.


> 
> On Wed, Jul 20, 2022 at 7:04 PM Mike <te...@mflan.com> wrote:
> 
> > 
> > I'm going to be traveling, so will not be able to help much
> > in the next 2 days.
> > 
> > That is a PDF file you supplied.  Is it fair to say you want to
> > be able to search for all the names listed in a text file and be
> > able to print out which file contains which name.  And in some
> > cases the name will not be in any of the files?  Is that the goal?
> > 
> > Define your goal and we will help you.
> > 
> > 
> > The file below is a bit old, but maybe it works for your
> > PDF files.  I have not tested it on your url.  I gather
> > you don't have HTML tables, so maybe it is not for your case.
> > 
> > 
> > Mike
> > 
> > 
> > #!/usr/bin/perl -w
> > #
> > #
> > # This program writes the results of the webpage listed in line 17
> > # to $outfile.  So basically it converts HTML to text.
> > # It works reasonably well with HTML tables.
> > #
> > #
> > 
> > #!/usr/bin/perl
> > use strict;
> > use warnings;
> > use LWP::UserAgent;
> > use HTML::FormatText::WithLinks::AndTables;
> > 
> > 
> > my $page = 'http://www.mflan.com/crime.htm';
> > 
> > my $outfile = 'output.txt';
> > 
> > chdir '/home/mike/Documents/copy';
> > 
> > open OUT, ">>$outfile" or die "Can't open '$outfile': $!";
> > 
> > my ($sl, $request, $response, $html);
> > 
> > $sl = LWP::UserAgent->new;
> > 
> > 
> > $sl->proxy('http', ''); # enter proxy if needs be / and set it for
> > Soap
> > too ...
> > $request = HTTP::Request->new('GET', $page);
> > $response = $sl->request($request);
> > $html = $response->as_string;
> > 
> > print "Got it into \$html.\n";
> > 
> > 
> > 
> > my $text = HTML::FormatText::WithLinks::AndTables->convert($html);
> > 
> > 
> > print OUT "$text";
> > 
> > print "\nAll done.\n";
> > 
> > close OUT;
> > 
> > 
> > __END__
> > 
> > 
> > 
> > 
> > On 7/20/22 10:13, William Torrez Corea wrote:
> > > The url of the page:
> > > 
> > > https://www.pgr.gob.ni/PDF/2021/GACETA/GACETA_17_08_2021.pdf
> > > 
> > > On 7/20/22, William Torrez Corea <willitc9...@gmail.com> wrote:
> > > > Exist a page where you put info about the person but if you
> > > > want to
> > search
> > > > a name you must search this manually. So, I want to automate
> > > > this
> > process
> > > > with perl.
> > > > --
> > > > 
> > > > With kindest regards, William.
> > > > 
> > > > ⢀⣴⠾⠻⢶⣦⠀
> > > > ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
> > > > ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
> > > > ⠈⠳⣄⠀⠀⠀⠀
> > > > 
> > > 
> > 
> > 
> 


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to