On Monday 08 January 2007 20:19, Peter Karman wrote:
> Xavier Robin scribbled on 1/8/07 11:14 AM:
> > Do you know a (catalyst plugin|perl module|external tool) that converts
> > HTML to plain text? I mean, keeping some formatting (especially lists and
> > links...), not just stripping HTML tags...
>
> I use the w3m tool:
>
>   % w3m -dump file.html > file.txt
>
> I like it because it preserves tables pretty well.

Unfortunately it doesn't print href attributes of links.
I also tried HTML::Scrubber as proposed by Carl Franks, but basically it keeps 
some tags we chose to allow.

In fact, I'm looking for something that could convert my html file to a plain 
text file, so that no markup is allowed at all.

For example, a link like that:

<a href="http://site.example";>A link</a>

would be transformed into something like:

A link
http://site.example

I'm sure that a module doing that exists on cpan.

Thanks,
Xavier
-- 
Some people says that if you play a Windows XP install CD backwards you will 
hear demon voices commanding you to worship Satan. But that's nothing. If you 
play it forward it will install Windows XP.

_______________________________________________
List: [email protected]
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[email protected]/
Dev site: http://dev.catalyst.perl.org/

Reply via email to