I'm trying to convert an HTML page to plain text. I'm struggling with a behavior that seems counter intuitive. Perhaps there's a flag I am missing.
My lynx (2.8.6dev11) invocation is this (I'm in an xterm which has 300 columns): $ unset COLUMNS $ /projects/intranet/bin/lynx -dont_wrap_pre -width=999 -hiddenlinks=ignore -nobold -nocolor -nolist -dump -force_html /tmp/lib.html > /volws/$USER/ldatae/$USER.lib.txt The intent that I'm trying for is that lynx not wrap any lines of text . However, I'm not successful in this attempt. A sample of the html giving me fits (please understand that this is a extremely cut down version) looks like this: <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/html4/loose.dtd"> <html> <head> </head> <body> <table border=0 cellspacing="15"> <tr> <th align=left> Title/Author </th> <th align=left> Format </th> <th align=left> Status </th> <th align=left> </th> <th align=left> </th> </tr> <tr> <td valign="top">Those who walk in darkness / Ridley, John,</td> <td valign="top">Book</td> <td valign="top"> <table bgcolor="yellow"> <tr> <td valign="top"> ready for pickup at Reynoldsburg by 22Apr2005 </td> </tr> </table> </td> <td> <form method=post action="/cgi-bin/wpcr1075.shtml"> <input type=hidden name="requestcd" value=1> <input type=hidden name="title" value="Those%20who%20walk%20in%20darkness%20/%20Ridley,%20John,"> <input type=hidden name="recordnumber" value=59423> <input type=hidden name="patronid" value=210188336> <input type=hidden name="rsp_reserve_cnt" value="2"> <input type=hidden name="rsp_reserve_video_cnt" value="5"> <input type=hidden name="rsp_print_count" value="2"> <input type=hidden name="rsp_audio_count" value="3"> <input type=hidden name="rsp_video_count" value="0"> <input type=hidden name="rsp_im_count" value="0"> <input type=hidden name="rsp_first_name" value="LARRY"> <input type=hidden name="rsp_fine_balance" value="0.0"> <input type=hidden name="matl_type_desc" value="Book"> <input type=submit value="Cancel" style="background-color: red"> </form> </td> <td valign=top> </td> </tr> </table> </body> </html> I have no control over what html is generated, and I'd rather not massage the html after I fetch it if I don't have to. Where as what I expected was: Title/Author Format Status Those who walk in darkness / Ridley, John, Book ready for pickup at Reynoldsburg by 22Apr2005 [1]Cancel Title/Author Format Status Those who walk in darkness / Ridley, John, Book ready for pickup at Reynoldsburg by 22Apr2005 what I get is Title/Author Format Status Those who walk in darkness / Ridley, John, Book ready for pickup at Reynoldsburg by 22Apr2005 [1]Cancel If I use Firefox or Mozilla with a window of 300 characters, and then view the HTML, and the page is displayed as I was hoping lynx would do. Is there some additional flag I am needing to get lynx to not wrap things like it is doing? -- Tcl - The glue of a new generation. <URL: http://wiki.tcl.tk/ > Larry W. Virden <mailto:[EMAIL PROTECTED] > <URL: http://www.purl.org/NET/lvirden/ > Even if explicitly stated to the contrary, nothing in this posting should be construed as representing my employer's opinions. -><- _______________________________________________ Lynx-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lynx-dev
