Re: [Boston.pm] Extract text from html preserving newlines

Tolkin, Steve Wed, 02 May 2007 11:01:58 -0700

Thanks Jerrad,  

I actually tried lynx first.  However, the html files are on a server
that needs authentication.  Even adding 
-auth my-user-id:my-pw 
To lynx was not enough.


Here is the lynx output (I added the # as these are comments in the perl
program):

# Looking up [my proxy]
# Making HTTP connection to [my proxy]
# Sending HTTP request.
# HTTP request sent; waiting for response.
# Alert!: Invalid header 'WWW-Authenticate: NTLM'
# Alert!: Can't retry with authorization!  Contact the server's
WebMaster.
# Can't Access [the url I wanted]
# Alert!: Unable to access document.
# 
# lynx: Can't access startfile


I am not sure what I really need to do.  I looked at the headers using
Mozilla Firefox add-on and decided that generating the proper values for
WWW-Authenticate was too complex for lynx, and for Mechanize too.   But
maybe I am missing something.


Steve


-----Original Message-----
From: Jerrad Pierce [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 1:45 PM
To: Tolkin, Steve
Cc: Boston Perl Mongers
Subject: Re: [Boston.pm] Extract text from html preserving newlines

lynx -dump
-- 
Free map of local environmental resources:
http://CambridgeMA.GreenMap.org
--
MOTD on Boomtime, the 49th of Discord, in the YOLD 3173:
It is useless for sheep to pass resolutions in favor of vegetarianism
while wolves remain of a different opinion.

 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Extract text from html preserving newlines

Reply via email to