Re: [OT] right mailing list for perl lwp/html question

Marco Baroni Fri, 26 Sep 2003 13:30:01 -0700

Thanks!

(I'm keeping this on the list, in case there is somebody else interested in the topic... hope it doesn't bother the others too much.)

So, my goal is this. I get a list of url's from another script, and I would like to create an output file in format:

CURR_URL_www.blah1.com

blah blah blah
blah blah blah

CURR_URL_www.blah2.com

blah blah blah

...

for all the url's in the list (skipping things that end in .doc, .pdf, etc.)

I would like latin1 entities to be resolved and the contents should be in unadorned text format, with no meta-information. (E.g., those lists of ''links found in the page'' that one gets with lynx -dump are extremely annoying for my purposes.)

I used to do this with LWP::Simple, HTML::Parse and HTML::FormatText.

However, I've seen that HTML::Parse is now deprecated, and my old script gave me some problems anyway, so I feel like it's time to update it.

Is there a way in which curl could help me? (I would rather write a short script with lots of imperfections than something really good that takes a week...)

Thanks again!

Marco

Re: [OT] right mailing list for perl lwp/html question

Reply via email to