> Steve Palincsar wrote:
> 
> > I'm using Netscape 4.x's command line interface at work as part of a  
> > processing system that loads HTML pages and saves them as text under  
> > program control.  I haven't seen anything discussing whether Mozilla  
> > also supports a command line interface, and as far as I've been able 
> > to  discover, Mozilla doesn't appear to support save as text at all at 
> > this  point.

Ben Bucksch writes:
> I don't think Mozilla supports this the way you describe (never heard of 
> that 4.x feature).

Right, I don't know of a way to do this with the mozilla application
itself (file a bug/rfe!)

> However, if you have retrieved the HTML page anyhow, you might be able 
> to use one of the test apps do to convert it to plaintext. Maybe, they 
> are available in binary form in the zipfile/tarball builds from 
> mozilla.org, maybe you need to compile them yourself.
> 
> The HTML->TXT converter in Mozilla is nsPlainTextSerializer.cpp, 
> formerly nsHTMLToTXTSinkStream.cpp.

Specifically, you might want to look at the TestOutput program,
built in the mozilla tree if you enable tests.  It can function as a
standalone html-to-text converter using the same code mozilla uses
for converting mail messages to plaintext.  The code for it is in
Convert.cpp, and sample usage is in TestOutSinks (see
http://lxr.mozilla.org/seamonkey/ to find or view these files
if you don't already have a source tree).

The TestOutput program is small, and it or something like it could
easily be distributed if there was need for it.

        ...Akkana

Reply via email to