On Monday 08 January 2007 20:19, Peter Karman wrote: > Xavier Robin scribbled on 1/8/07 11:14 AM: > > Do you know a (catalyst plugin|perl module|external tool) that converts > > HTML to plain text? I mean, keeping some formatting (especially lists and > > links...), not just stripping HTML tags... > > I use the w3m tool: > > % w3m -dump file.html > file.txt > > I like it because it preserves tables pretty well.
Unfortunately it doesn't print href attributes of links. I also tried HTML::Scrubber as proposed by Carl Franks, but basically it keeps some tags we chose to allow. In fact, I'm looking for something that could convert my html file to a plain text file, so that no markup is allowed at all. For example, a link like that: <a href="http://site.example">A link</a> would be transformed into something like: A link http://site.example I'm sure that a module doing that exists on cpan. Thanks, Xavier -- Some people says that if you play a Windows XP install CD backwards you will hear demon voices commanding you to worship Satan. But that's nothing. If you play it forward it will install Windows XP. _______________________________________________ List: [email protected] Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[email protected]/ Dev site: http://dev.catalyst.perl.org/
