Re: [WSG] convert to XHTML

Paul Novitski Wed, 03 May 2006 21:59:15 -0700

At 09:01 PM 5/3/2006, Stuart Sherwood wrote:

I'm wondering what is the best way to convert a large text file toXHTML? Preferably, I'd like the conversion to be performed to ignorestyles, so the output is clean, semantic markup. I'd rather add myown stlying later.

I think it's impossible to say how challenging this would be withoutknowing anything about the content of the text file. How organizedand consistent are the content and styling? What is there for aparser to grab onto? What verbal and stylistic patterns can itorient itself by? And what's the file format?

I love writing software that parses human language; it's the most funof any programming I've done (which probably says something about mygeek quotient). Writing a parser for your document is probably goingto be practical (cost-effective) only if it will be run repeatedly,say on a document that comes to you with fresh content each month, orone a single document that is truly huge. If this is a small one-offjob, it would probably be cheaper to do it by hand -- with the aid ofmacros perhaps, but not scripting the whole thing.

Any chance you can work with the originators of the document tochange the way in which it's put together? That could have anenormous effect on the parsing job, including potentially eliminatingit altogether.

Paul

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************

Re: [WSG] convert to XHTML

Reply via email to