On Sun, 29 Sep 2002, Jim Hettmer wrote: > With 1.0.3 on RH7.3, I have technical documents created with an older > Netscape, html of course, which I would like to be able to convert over > to abw. Some small ones do, but normally I get the "bogus" message. > I've tried tidy -clean -asxhtml and -asxml, but to no avail. The small > html I created with abiword could be read back in provided I tidied it > first. > Also, maybe I'm being simplistic and/or naive, but it seemed to me > that if abiword would indicate something about what it found fault with, > I could possibly stumble around in the original and hand-edit the > offending stuff out?
AbiWord has a native XHTML importer that requires its input to be valid XML, but it has stricter requirements besides, some of them questionable. Elements like <div> are used by abiword to indicate sections, but <div> has a vast range of uses out there in the wild, so abiword gets confused. If you have a debug build of AbiWord, as I think Hub suggested, then you may get some useful nuggets of info about wht the importer doesn't like. > I'd appreciate it if someone could tell me what the current state, > thoughts, and recommendations are about importing html. Is there hope > for me, here?. There is also an HTML importer which doesn't require the file to be XML and is much more forgiving about tags - unfortunately, the importer is unfinished and may drop text which is in lists or tables. Also, a lot of style information is lost. So, not a great solution. If you are determined, then I would recommend importing your HTML docs twice, once using the HTML importer, and once using the text importer. I'm working my way towards a new, improved [X]HTML <-> ABW converter, but it's a while away yet. Regards, Frank Francis James Franklin [EMAIL PROTECTED] `Medium atomic weights are available: Gold, Lead, Copper, Jet, Diamond, Radium, Sapphire, Silver and Steel. `Sapphire and Steel have been assigned...' ----------------------------------------------- To unsubscribe from this list, send a message to [EMAIL PROTECTED] with the word unsubscribe in the message body.
