Thanks Jerrad, I actually tried lynx first. However, the html files are on a server that needs authentication. Even adding -auth my-user-id:my-pw To lynx was not enough.
Here is the lynx output (I added the # as these are comments in the perl program): # Looking up [my proxy] # Making HTTP connection to [my proxy] # Sending HTTP request. # HTTP request sent; waiting for response. # Alert!: Invalid header 'WWW-Authenticate: NTLM' # Alert!: Can't retry with authorization! Contact the server's WebMaster. # Can't Access [the url I wanted] # Alert!: Unable to access document. # # lynx: Can't access startfile I am not sure what I really need to do. I looked at the headers using Mozilla Firefox add-on and decided that generating the proper values for WWW-Authenticate was too complex for lynx, and for Mechanize too. But maybe I am missing something. Steve -----Original Message----- From: Jerrad Pierce [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 1:45 PM To: Tolkin, Steve Cc: Boston Perl Mongers Subject: Re: [Boston.pm] Extract text from html preserving newlines lynx -dump -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion. _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

